DeaR bioconductors,
we run an internal microarray analysis pipeline and switched today
from
R/BioC (2.8.1/2.3) to 2.9/2.4.
After running some test code, I came across the following error:
testCode:
> x<-rep(NA,10)
> unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA)))
when I run this code snippet with 2.8.1/2.3 the corresponding return
value is
> [1] NA
but with 2.9/2.4 I got the following error:
> Error during wrapup: keys must be supplied in a character vector
with
no NAs
This causes our pipeline to break there and stop the analysis while in
the previous case the analysis still continued with NA values.
Please do not think that I am a picky person, but was there any urgent
need to change the behaviour of mget()?
Is it possible to somehow bypass this?
Thanks a lot for any help.
Christian
--
Christian Kohler
Institute of Functional Genomics
Computational Diagnostics
University of Regensburg (BioPark I)
D-93147 Regensburg (Germany)
Tel. +49 941 943 5055
Fax +49 941 943 5020
christian.kohler at klinik.uni-regensburg.de
Hi Christian,
Christian Kohler wrote:
> DeaR bioconductors,
>
> we run an internal microarray analysis pipeline and switched today
from
> R/BioC (2.8.1/2.3) to 2.9/2.4.
> After running some test code, I came across the following error:
>
> testCode:
>> x<-rep(NA,10)
>> unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA)))
>
>
> when I run this code snippet with 2.8.1/2.3 the corresponding return
> value is
>> [1] NA
Really?
> x <- rep(NA, 10)
> mget(x, hgu95av2ENTREZID)
Error in .checkKeysAreWellFormed(keys) :
keys must be supplied in a character vector with no NAs
> sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] tools stats graphics grDevices datasets utils
methods
[8] base
other attached packages:
[1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4
[4] AnnotationDbi_1.4.3 Biobase_2.2.2
>
> but with 2.9/2.4 I got the following error:
>> Error during wrapup: keys must be supplied in a character vector
with
> no NAs
>
> This causes our pipeline to break there and stop the analysis while
in
> the previous case the analysis still continued with NA values.
>
> Please do not think that I am a picky person, but was there any
urgent
> need to change the behaviour of mget()?
> Is it possible to somehow bypass this?
The easiest way is to strip the NA values, using the canonical
x <- x[!is.na(x)]
Best,
Jim
>
>
> Thanks a lot for any help.
>
> Christian
>
>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
I merged probe ids from affy hgu133a and b chips, then looked them
up using
mget(probelist, hgu133aSYMBOL)
Then I tried the same lookup with hgu133bSYMBOL
I expected a difference, since the chips contain fairly unique
symbols.
Are symbols unique to A or B known to both?
Thanks.
Tom
Hi Thomas,
Gene symbols cannot be relied upon to be unique in any case. They are
frequently "assigned" to multiple different genes. I might be better
able to help you if you were a little bit more specific about what you
are seeing. But what you should see is that these two platforms have
mappings for the subset of the genes that they represent.
So for example hgu133b has a mapping for probeset 229819_at to symbol
A1BG. But the hgu133a chip does not have a probe that maps to this
gene
symbol. So that would be one example (at least) of a difference and
there are many more. There may be some overlap for symbols caused in
part by the fact that some probesets IDs will measure the same gene
and
also because gene symbols are horrible as identifiers but for the most
part you should see different symbols on these platforms.
Marc
Thomas Hampton wrote:
> I merged probe ids from affy hgu133a and b chips, then looked them
> up using
>
> mget(probelist, hgu133aSYMBOL)
>
> Then I tried the same lookup with hgu133bSYMBOL
>
> I expected a difference, since the chips contain fairly unique
symbols.
>
> Are symbols unique to A or B known to both?
>
>
> Thanks.
>
> Tom
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Marc,
Thanks for your reply.
For a unique gene identifier, do you recommend ENTREZID as over
SYMBOL?
I am comparing three experiments on three platforms
hgu95av2.db
hgu133a
hgu133a + b
So what I am after is a nice common identifier for these chips.
Thanks
Tom
On Apr 30, 2009, at 2:09 PM, Marc Carlson wrote:
> Hi Thomas,
>
> Gene symbols cannot be relied upon to be unique in any case. They
are
> frequently "assigned" to multiple different genes. I might be
better
> able to help you if you were a little bit more specific about what
you
> are seeing. But what you should see is that these two platforms
have
> mappings for the subset of the genes that they represent.
>
> So for example hgu133b has a mapping for probeset 229819_at to
symbol
> A1BG. But the hgu133a chip does not have a probe that maps to this
> gene
> symbol. So that would be one example (at least) of a difference
and
> there are many more. There may be some overlap for symbols caused
in
> part by the fact that some probesets IDs will measure the same gene
> and
> also because gene symbols are horrible as identifiers but for the
most
> part you should see different symbols on these platforms.
>
>
> Marc
>
>
>
>
>
> Thomas Hampton wrote:
>> I merged probe ids from affy hgu133a and b chips, then looked
them
>> up using
>>
>> mget(probelist, hgu133aSYMBOL)
>>
>> Then I tried the same lookup with hgu133bSYMBOL
>>
>> I expected a difference, since the chips contain fairly unique
>> symbols.
>>
>> Are symbols unique to A or B known to both?
>>
>>
>> Thanks.
>>
>> Tom
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
Hi Thomas,
Entrez Gene IDs are a great alternative to symbols. They are not
recycled. So if you meet the same entrez ID in another setting you
can
be assured that it refers to the same thing that it did before. In
contrast, with gene symbols you have cases like the "VH" gene which is
presently assigned to 36 different genes in humans! So if someone
tells
you that they work on the VH gene you have to ask them which one???
That sort of nonsense is just not helpful when doing informatics work.
So yes, you should probably use Entrez Gene IDs.
Marc
Thomas Hampton wrote:
> Marc,
>
> Thanks for your reply.
>
> For a unique gene identifier, do you recommend ENTREZID as over
SYMBOL?
>
> I am comparing three experiments on three platforms
>
> hgu95av2.db
> hgu133a
> hgu133a + b
>
> So what I am after is a nice common identifier for these chips.
>
> Thanks
>
> Tom
> On Apr 30, 2009, at 2:09 PM, Marc Carlson wrote:
>
>> Hi Thomas,
>>
>> Gene symbols cannot be relied upon to be unique in any case. They
are
>> frequently "assigned" to multiple different genes. I might be
better
>> able to help you if you were a little bit more specific about what
you
>> are seeing. But what you should see is that these two platforms
have
>> mappings for the subset of the genes that they represent.
>>
>> So for example hgu133b has a mapping for probeset 229819_at to
symbol
>> A1BG. But the hgu133a chip does not have a probe that maps to this
gene
>> symbol. So that would be one example (at least) of a difference
and
>> there are many more. There may be some overlap for symbols caused
in
>> part by the fact that some probesets IDs will measure the same gene
and
>> also because gene symbols are horrible as identifiers but for the
most
>> part you should see different symbols on these platforms.
>>
>>
>> Marc
>>
>>
>>
>>
>>
>> Thomas Hampton wrote:
>>> I merged probe ids from affy hgu133a and b chips, then looked
them
>>> up using
>>>
>>> mget(probelist, hgu133aSYMBOL)
>>>
>>> Then I tried the same lookup with hgu133bSYMBOL
>>>
>>> I expected a difference, since the chips contain fairly unique
symbols.
>>>
>>> Are symbols unique to A or B known to both?
>>>
>>>
>>> Thanks.
>>>
>>> Tom
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
>
James W. MacDonald wrote:
> Hi Christian,
>
> Christian Kohler wrote:
>> DeaR bioconductors,
>>
>> we run an internal microarray analysis pipeline and switched today
from
>> R/BioC (2.8.1/2.3) to 2.9/2.4.
>> After running some test code, I came across the following error:
>>
>> testCode:
>>> x<-rep(NA,10)
>>> unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA)))
>>
>>
>> when I run this code snippet with 2.8.1/2.3 the corresponding
return
>> value is
>>> [1] NA
>
> Really?
>
> > x <- rep(NA, 10)
> > mget(x, hgu95av2ENTREZID)
> Error in .checkKeysAreWellFormed(keys) :
> keys must be supplied in a character vector with no NAs
> > sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools stats graphics grDevices datasets utils
methods
> [8] base
>
> other attached packages:
> [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4
> [4] AnnotationDbi_1.4.3 Biobase_2.2.2
>
>
>>
>> but with 2.9/2.4 I got the following error:
>>> Error during wrapup: keys must be supplied in a character vector
with
>> no NAs
>>
>> This causes our pipeline to break there and stop the analysis while
in
>> the previous case the analysis still continued with NA values.
>>
>> Please do not think that I am a picky person, but was there any
urgent
>> need to change the behaviour of mget()?
>> Is it possible to somehow bypass this?
>
> The easiest way is to strip the NA values, using the canonical
>
> x <- x[!is.na(x)]
>
> Best,
>
> Jim
>
>
>>
>>
>> Thanks a lot for any help.
>>
>> Christian
>>
>>
>>
>
Hi Jim,
thanks so much for your quick reply, but to be honest I still do not
understand, why my function-call ( unique(unlist(mget(x,
env=hgu133plus2ENTREZID,ifnotfound=NA))) ) produces 'NA' instead of
the
error-message above.
The interesting thing is, that if I analyse exactly the same data with
2.3 as well as with 2.4, the analysis does not break with 2.3 but with
2.4 !?.
Well, I guess the solution is somewhat simple :-)
All the best,
Christian
--
Christian Kohler
Institute of Functional Genomics
Computational Diagnostics
University of Regensburg (BioPark I)
D-93147 Regensburg (Germany)
Tel. +49 941 943 5055
Fax +49 941 943 5020
christian.kohler at klinik.uni-regensburg.de
Hi Christian,
Christian Kohler wrote:
[...]
> thanks so much for your quick reply, but to be honest I still do not
> understand, why my function-call ( unique(unlist(mget(x,
> env=hgu133plus2ENTREZID,ifnotfound=NA))) ) produces 'NA' instead of
the
> error-message above.
Without having your sessionInfo(), we won't be able to tell either...
>
> The interesting thing is, that if I analyse exactly the same data
with
> 2.3 as well as with 2.4, the analysis does not break with 2.3 but
with
> 2.4 !?.
>
> Well, I guess the solution is somewhat simple :-)
If that means you are going to stick with 2.3 then yes, it's a simple
solution, but please note that 2.3 is not supported anymore and that
the annotations in 2.4 are much more recent and supposedly more
accurate.
The small code modification suggested by Jim is really straightforward
and a better way to go IMO. And as an extra benefit, other people
will be able to run your pipeline and reproduce your results (the
current code is expected to break for anybody with a standard
installation).
Cheers,
H.
>
> All the best,
> Christian
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319