How does BioC map from Probe ID to Entrez Gene?
1
0
Entering edit mode
@jacob-michaelson-1079
Last seen 10.3 years ago
Hi all, I've finished up with an analysis and in reviewing some of the annotations for gene symbols and RefSeqs, I've found some discrepancies that I don't know how to explain. The discrepancies are between Affy-supplied annotation (both both CSV and NetAffx) and BioC annotation. Let's take this probe for example: 1558097_at > sessionInfo() Version 2.3.0 (2006-04-24) i686-pc-linux-gnu attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: hgu133plus2 "1.12.0" > mget("1558097_at", hgu133plus2LOCUSID) $`1558097_at` [1] 8971 On NetAffx, the Entrez Gene ID shows 253143. I've got about 12 other probe sets that BioC and Affy disagree strongly on (symbols, RefSeqs, etc.). I suspect these can all be traced back the the Entrez ID disagreement. Since much of BioC's subsequent annotation is based on the Entrez Gene ID, the correct mapping from the Affy Probe ID to the Entrez gene ID is crucial. Which brings me to my question - how exactly does BioC map from Affy probe IDs to Entrez Gene IDs? There seems to be thorough documentation of how Entrez IDs are mapped to other annotations like Pubmed, GO, etc. but not much on how the Entrez Gene ID was mapped from the probe ID in the first place. My cursory "hand" examination tends to side with Affy, by BLAST-ing their probe sequences. Any enlightenment would be much appreciated. Thanks, Jake
Annotation GO probe affy Annotation GO probe affy • 1.4k views
ADD COMMENT
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.3 years ago
>Which brings me to my question - how exactly does BioC map from Affy >probe IDs to Entrez Gene IDs? There seems to be thorough documentation >of how Entrez IDs are mapped to other annotations like Pubmed, GO, etc. >but not much on how the Entrez Gene ID was mapped from the probe ID in >the first place. My cursory "hand" examination tends to side with Affy, >by BLAST-ing their probe sequences. BioC takes the GeneBank ids associated with the probes (provided by the manufacture) and then maps them to Entrez Gene ids using data from UniGene, Entrez Gene, and other available data sources we trust. The Entrez Gene id a probe is assigned to is determined by votes from all the sources used. If there is no agreement among the sources, we take the smallest Entrez Gene id. > >Any enlightenment would be much appreciated. > >Thanks, > >Jake > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT

Login before adding your answer.

Traffic: 550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6