Question

Annotations dealing with "removed" refseq record

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 10.7 years ago

Hi, I think the annotation system has problems dealing with RefSeq that were removed. This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole genome mouse chip from Agilent (annotation package: mgug4122a). From the annotations provided by Agilent, there are 2 probes that map to it: A_52_P49250 and A_51_P216179. Currently, the annotations do not give any results for it: > library(mgug4122a) > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL)) A_52_P49250 A_51_P216179 NA NA The accession number that is given indeed points to NM_010152. > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM)) A_52_P49250 A_51_P216179 "NM_010152" "NM_010152" Looking at it on the NCBI website, it does point to Erbb2, but it also says: "This record was removed by RefSeq staff". Not being entirely familiar with the process, I would point to this as a likely reason for the lack of annotations for those two probes. I have not done an extensive check between the Agilent annotation and the ones in mgug4122a to see how many other probes might be hit by this. > sessionInfo() R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8; LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8; LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; LC_IDENTIFICATION=C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" [6] "methods" "base" other attached packages: mgug4122a "1.16.0" If there is any more information I can provide, please tell me. Francois

Annotation mgug4122a PROcess Annotation mgug4122a PROcess • 1.5k views

ADD COMMENT • link updated 17.9 years ago by Nianhua Li ▴ 870 • written 17.9 years ago by Francois Pepin ★ 1.3k

score 0 · Answer 1 · 2007-06-08

Hi, Francois, If I remember correctly, we had a hard time finding up-to-date annotations from Agilent. The annotation file we downloaded from Agilent was out-of- date. We still update the annotation packages for each release, but probeset to gene mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite a long time. In another word, we only update the annotations for the genes. So, if mgug4122aACCNUM is wrong/deprecated for a probeset, then other annotations for this probeset will be incorrect. Could you please post the link to the up-to-date annotation file? We can re-build the annotation packages base on them. Your help will be highly appreciated. thanks nianhua Quoting Francois Pepin <fpepin at="" cs.mcgill.ca="">: > Hi, > > I think the annotation system has problems dealing with RefSeq that were > removed. > > This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole > genome mouse chip from Agilent (annotation package: mgug4122a). From the > annotations provided by Agilent, there are 2 probes that map to it: > A_52_P49250 and A_51_P216179. > > Currently, the annotations do not give any results for it: > > > library(mgug4122a) > > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL)) > A_52_P49250 A_51_P216179 > NA NA > > The accession number that is given indeed points to NM_010152. > > > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM)) > A_52_P49250 A_51_P216179 > "NM_010152" "NM_010152" > > Looking at it on the NCBI website, it does point to Erbb2, but it also > says: "This record was removed by RefSeq staff". > > Not being entirely familiar with the process, I would point to this as a > likely reason for the lack of annotations for those two probes. > > I have not done an extensive check between the Agilent annotation and > the ones in mgug4122a to see how many other probes might be hit by this. > > > sessionInfo() > R version 2.5.0 (2007-04-23) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8; > LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8; > LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; > LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; > LC_IDENTIFICATION=C > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > [6] "methods" "base" > > other attached packages: > mgug4122a > "1.16.0" > > If there is any more information I can provide, please tell > me. > > Francois > >