Hi,
I think the annotation system has problems dealing with RefSeq that
were
removed.
This is looking at the Erbb2 gene in mouse (entrezID=13866) on the
whole
genome mouse chip from Agilent (annotation package: mgug4122a). From
the
annotations provided by Agilent, there are 2 probes that map to it:
A_52_P49250 and A_51_P216179.
Currently, the annotations do not give any results for it:
> library(mgug4122a)
> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL))
A_52_P49250 A_51_P216179
NA NA
The accession number that is given indeed points to NM_010152.
> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM))
A_52_P49250 A_51_P216179
"NM_010152" "NM_010152"
Looking at it on the NCBI website, it does point to Erbb2, but it also
says: "This record was removed by RefSeq staff".
Not being entirely familiar with the process, I would point to this as
a
likely reason for the lack of annotations for those two probes.
I have not done an extensive check between the Agilent annotation and
the ones in mgug4122a to see how many other probes might be hit by
this.
> sessionInfo()
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;
LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;
LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;
LC_IDENTIFICATION=C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets"
[6] "methods" "base"
other attached packages:
mgug4122a
"1.16.0"
If there is any more information I can provide, please tell
me.
Francois
Hi, Francois,
If I remember correctly, we had a hard time finding up-to-date
annotations from
Agilent. The annotation file we downloaded from Agilent was out-of-
date. We
still update the annotation packages for each release, but probeset to
gene
mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite a
long
time. In another word, we only update the annotations for the genes.
So, if
mgug4122aACCNUM is wrong/deprecated for a probeset, then other
annotations for
this probeset will be incorrect.
Could you please post the link to the up-to-date annotation file? We
can
re-build the annotation packages base on them. Your help will be
highly
appreciated.
thanks
nianhua
Quoting Francois Pepin <fpepin at="" cs.mcgill.ca="">:
> Hi,
>
> I think the annotation system has problems dealing with RefSeq that
were
> removed.
>
> This is looking at the Erbb2 gene in mouse (entrezID=13866) on the
whole
> genome mouse chip from Agilent (annotation package: mgug4122a). From
the
> annotations provided by Agilent, there are 2 probes that map to it:
> A_52_P49250 and A_51_P216179.
>
> Currently, the annotations do not give any results for it:
>
> > library(mgug4122a)
> > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL))
> A_52_P49250 A_51_P216179
> NA NA
>
> The accession number that is given indeed points to NM_010152.
>
> > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM))
> A_52_P49250 A_51_P216179
> "NM_010152" "NM_010152"
>
> Looking at it on the NCBI website, it does point to Erbb2, but it
also
> says: "This record was removed by RefSeq staff".
>
> Not being entirely familiar with the process, I would point to this
as a
> likely reason for the lack of annotations for those two probes.
>
> I have not done an extensive check between the Agilent annotation
and
> the ones in mgug4122a to see how many other probes might be hit by
this.
>
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;
> LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;
> LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
> LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets"
> [6] "methods" "base"
>
> other attached packages:
> mgug4122a
> "1.16.0"
>
> If there is any more information I can provide, please tell
> me.
>
> Francois
>
>
nli at fhcrc.org wrote:
> Hi, Francois,
>
> If I remember correctly, we had a hard time finding up-to-date
annotations from
> Agilent. The annotation file we downloaded from Agilent was out-of-
date. We
> still update the annotation packages for each release, but probeset
to gene
> mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite
a long
> time. In another word, we only update the annotations for the genes.
So, if
> mgug4122aACCNUM is wrong/deprecated for a probeset, then other
annotations for
> this probeset will be incorrect.
>
> Could you please post the link to the up-to-date annotation file? We
can
> re-build the annotation packages base on them. Your help will be
highly
> appreciated.
>
I think this is it, but they are still pretty old (many from fall,
2006):
http://www.chem.agilent.com/cag/bsp/gene_lists.asp?arrayType=gene
Sean
> > Could you please post the link to the up-to-date annotation file?
We can
> > re-build the annotation packages base on them. Your help will be
highly
> > appreciated.
> >
> I think this is it, but they are still pretty old (many from fall,
2006):
>
> http://www.chem.agilent.com/cag/bsp/gene_lists.asp?arrayType=gene
Yes, that is a more recent version.
Agilent also has another website (http://earray.chem.agilent.com/) for
their customers that has more up-to-date definitions. For example the
mouse whole genome array dates from February 2007. You might want to
contact Agilent to get access to that site.
Francois