Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes:
> Hi Martin,
>
> Thanks for the help. I managed to fix the issue by resetting all of
the
> levels on both side (having everything as characters should work
too):
>
> for (i in 1:length(pData(phenoData(tmp[[1]]))))
> levels(pData(phenoData(tmp[[1]]))[,i])<-levels(pData(phenoData(tmp
> [[2]]))[,i]) <- c(unique(as.character(pData(phenoData(tmp
> [[1]]))[,i])),unique(as.character(pData(phenoData(tmp[[2]]))[,i])))
>
> The next question would be to see where it would best be taken care
of.
> I really don't see why this should not be taken care of behind the
> scene.
>
> The two main options I see would be that getGEO() returns characters
of
> phenoData instead of factors or having combine() know to deal with
> factors properly for expressionSet.
combine does know how to deal with factors properly -- the levels are
different, so the columns (usually) can't be combined. But I
appreciate the sentiment, and the issue has come up on the mailing
list three times since 2.1, so is a common occurrence. I've tried some
more at making the documentation better, and will work on a better set
of warnings for the next release of Bioconductor.
> If the former is chosen, I think it would probably be worth
adjusting
> the documentation about combine to mention this issue. As an
unrelated
> note, the ExpressionSet documentation refers to the eSet's. Since
eSet
> is going away at some point, that might be worth changing.
Actually, 'eSet' is a class that 'ExpressionSet' extends; 'eSet' is
not going to away, and many of the data slots and methods on
ExpressionSet are inherited from eSet so it's appropriate to
reference the eSet documentation for these. The 'exprSet' class is no
longer supported.
Thanks for your input,
Martin
> Francois
>
> On Wed, 2008-01-30 at 10:54 -0800, Martin Morgan wrote:
>> So part of the bug fix was an attempt to make the error message
more
>> informative, and it's not really clear that I've done that!
>>
>> The traceback makes it's clear that the problem is with the pData
(and
>> not, for instance varMetadata or featureData) of the two arrays.
>>
>> Some hints are provided by the warnings, by the ?combine help page,
>>
>> 'combine(data.frame, data.frame)' Combines two 'data.frame'
>> objects so that the resulting 'data.frame' contains all
rows
>> and columns of the original objects. Rows and columns in
the
>> returned value are unique, that is, a row or column
>> represented in both arguments is represented only once in
the
>> result. To perform this operation, 'combine' makes sure
that
>> data in shared rows and columns is identical in the two
>> data.frames. Data diffrences in shared rows and columns
cause
>> an error. 'combine' issues a warning when a column is a
>> 'factor' and the levels of the factor in the two
>> 'data.frame's are different; the returned value may be
>> recoded.
>>
>> and by the results of
>>
>> > example(combine)
>>
>> particularly the last lines which are trying to illustrate your
>> problem:
>>
>> combin> # y is converted to 'factor' with different levels
>> combin> x <- data.frame(x=1:5,y=letters[1:5],
row.names=letters[1:5])
>>
>> combin> y <- data.frame(z=3:7,y=letters[3:7],
row.names=letters[3:7])
>>
>> combin> try(combine(x,y))
>> Error in combine(x, y) : data.frames contain conflicting data:
>> non-conforming colname(s): y
>> In addition: Warning messages:
>> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 5 string mismatches
>> 2: In switch(class(x[[nm]])[[1]], factor = { :
>> data frame column 'y' levels not all.equal
>>
>> The data.frame column 'y' is a 'factor' (rather than character
>> vectors) and combine doesn't know how to resolve a column that has
'c'
>> encoded as level 3 of a factor with one that has 'c' encoded as
level
>> 1.
>>
>> One solution is to enusre that columns that are really character
>> vectors are stored as such
>>
>> > x <- data.frame(x=1:5,y=I(letters[1:5]), row.names=letters[1:5])
>> > y <- data.frame(z=3:7,y=I(letters[3:7]), row.names=letters[3:7])
>> > combine(x,y)
>> x y z
>> a 1 a NA
>> b 2 b NA
>> c 3 c 3
>> d 4 d 4
>> e 5 e 5
>> f NA f 6
>> g NA g 7
>>
>> or that factors have the same levels
>>
>> > y1 <- factor(letters[1:5], levels=letters[1:7])
>> > y2 <- factor(letters[3:7], levels=letters[1:7])
>> > x <- data.frame(x=1:5, y=y1, row.names=letters[1:5])
>> > y <- data.frame(z=3:7, y=y2, row.names=letters[3:7])
>> > combine(x,y)
>> x y z
>> a 1 a NA
>> b 2 b NA
>> c 3 c 3
>> d 4 d 4
>> e 5 e 5
>> f NA f 6
>> g NA g 7
>>
>> Martin
>>
>> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes:
>>
>> > Hi Martin,
>> >
>> > I think it is related, as I now have a different error message
along
>> > with a series of warnings. 255 and 98 refer to the number of
samples in
>> > each ExpressionSet. 66 and 21 refer to the number of unique
elements in
>> > source_name_ch1 in the phenodata.
>> >
>> >> tmp2<-combine(tmp[[1]],tmp[[2]])
>> > Error in .local(x, y, ...) :
>> > data.frames contain conflicting data:
>> > non-conforming colname(s): title, geo_accession,
>> > source_name_ch1, description, supplementary_file
>> > In addition: Warning messages:
>> > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) :
>> > Lengths (255, 98) differ (string compare on first 98)98 string
>> > mismatches
>> > 2: In switch(class(x[[nm]])[[1]], factor = { :
>> > data frame column 'title' levels not all.equal
>> > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) :
>> > Lengths (255, 98) differ (string compare on first 98)98 string
>> > mismatches
>> > 4: In switch(class(x[[nm]])[[1]], factor = { :
>> > data frame column 'geo_accession' levels not all.equal
>> > 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) :
>> > Lengths (66, 21) differ (string compare on first 21)21 string
>> > mismatches
>> > 6: In switch(class(x[[nm]])[[1]], factor = { :
>> > data frame column 'source_name_ch1' levels not all.equal
>> > 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) :
>> > Lengths (255, 98) differ (string compare on first 98)98 string
>> > mismatches
>> > 8: In switch(class(x[[nm]])[[1]], factor = { :
>> > data frame column 'description' levels not all.equal
>> > 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) :
>> > Lengths (255, 98) differ (string compare on first 98)98 string
>> > mismatches
>> > 10: In switch(class(x[[nm]])[[1]], factor = { :
>> > data frame column 'supplementary_file' levels not all.equal
>> >
>> >> traceback()
>> > 9: stop("data.frames contain conflicting data:", "\n\tnon-
conforming
>> > colname(s): ",
>> > paste(sharedCols[!ok], collapse = ", "))
>> > 8: .local(x, y, ...)
>> > 7: combine(pDataX, pDataY)
>> > 6: combine(pDataX, pDataY)
>> > 5: .local(x, y, ...)
>> > 4: combine(phenoData(x), phenoData(y))
>> > 3: combine(phenoData(x), phenoData(y))
>> > 2: combine(tmp[[1]], tmp[[2]])
>> > 1: combine(tmp[[1]], tmp[[2]])
>> >
>> >> sessionInfo()
>> > R version 2.6.0 (2007-10-03)
>> > x86_64-unknown-linux-gnu
>> >
>> > locale:
>> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=
en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=e
n_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.
UTF-8;LC_IDENTIFICATION=C
>> >
>> > attached base packages:
>> > [1] tools stats graphics grDevices utils datasets
methods
>> > [8] base
>> >
>> > other attached packages:
>> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2
>> >
>> > loaded via a namespace (and not attached):
>> > [1] rcompgen_0.1-15
>> >
>> > Francois
>> >
>> > On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote:
>> >> Hi Francois -- this might be related to a bug in Biobase that
has been
>> >> fixed. Can you try to update your Biobase, either
biocLite('Biobase')
>> >> or following the directions at
http://bioconductor.org/download
? If
>> >> not, can you provide the output of traceback() after the error
occurs?
>> >>
>> >> Thanks,
>> >>
>> >> Martin
>> >>
>> >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes:
>> >>
>> >> > Hi everyone,
>> >> >
>> >> > I'm getting an error message when trying to combine two parts
of a GSE
>> >> > object:
>> >> >
>> >> >>tmp<-getGEO('GSE3526',GSEMatrix=T)
>> >> >> tmp2<-combine(tmp[[1]],tmp[[2]])
>> >> > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x
>> >> > [sharedRows, :
>> >> > invalid 'x' type in 'x && y'
>> >> >
>> >> > Checking to make sure that I should be able to combine them
(from the
>> >> > eSet documentation):
>> >> >
>> >> > #eSets must have identical numbers of 'featureNames'
>> >> >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]]))
>> >> > [1] TRUE
>> >> >
>> >> > #must have distinct 'sampleNames'
>> >> >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]]))
>> >> > [1] FALSE
>> >> >
>> >> > #and must have identical 'annotation'.
>> >> >> annotation(tmp[[2]])==annotation(tmp[[2]])
>> >> > [1] TRUE
>> >> >
>> >> >> sessionInfo()
>> >> > R version 2.6.0 (2007-10-03)
>> >> > x86_64-unknown-linux-gnu
>> >> >
>> >> > locale:
>> >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
TE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPE
R=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_
US.UTF-8;LC_IDENTIFICATION=C
>> >> >
>> >> > attached base packages:
>> >> > [1] tools stats graphics grDevices utils datasets
methods
>> >> > [8] base
>> >> >
>> >> > other attached packages:
>> >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0
>> >> >
>> >> > loaded via a namespace (and not attached):
>> >> > [1] rcompgen_0.1-15
>> >> >
>> >> > Does anyone know why that is happening and if there would be
any way
>> >> > around it?
>> >> >
>> >> > Francois
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at stat.math.ethz.ch
>> >> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>>
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793