450K annotation: discrepancy between GEO GPL and Bioconductor annotation
1
0
Entering edit mode
Tom Bartlett ▴ 60
@tom-bartlett-5059
Last seen 10.2 years ago
Hi, I've noticed a discrepancy between the chromosome information given for some of the probes of the Illumina Infinium 450K array in the GEO GPL info, and in the corresponding Bioconductor annotation package. The first four probes on the 450K GPL summary page on the GEO website http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534 in the 'data table' are cg00035864, cg00050873, cg00061679 and cg00063477, and the corresponding value in the CHR column is Y for all four of these. However, in the corresponding Bioconductor annotation package IlluminaHumanMethylation450k.db, using IlluminaHumanMethylation450kCHR the chromosome for these same 4 probes is given as Y, NONE, NONE and Y, respectively. N.B., the values in the MAPINFO column of 'data table' and those found using IlluminaHumanMethylation450kCPGCOORDINATE are identical for these 4 probes. Is there any reason why there is this discrepancy, and might it be more widespread? Thanks in advance for your help Tom Bartlett
Annotation Annotation • 1.6k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.2 years ago
United States
toggleProbes() masks values where a probe is annotated to multiple transcripts as 'NONE' or 'NA' by default. Unfortunately, many (thousands) of the 450k probes are mapped to multiple transcripts in the manifest, and by default, the automatically generated bimap objects will treat them as if they were (degenerate) expression probes, masking them. I am attempting to address this by replacing the 450k.db, 27k.db, and 450kprobe packages with a faster, smaller, FeatureDb-based omnibus package that keeps track of the minimal information required to mask probes, annotate regions of interest, and process IDAT files, with all other operations (distance to TSS, chromosome, GC%, etc.) delegated to GenomicRanges and GenomicFeatures. In my experience this makes much more sense than using a framework that was originally created for expression probes. I didn't realize the difference when I first packaged the annotations into a SQLite database, which is why the 450k.db package uses the db0 machinery. Apologies for the confusion; hopefully this will be a memory as soon as I am up to speed on creating FeatureDb objects. --t On Wed, May 16, 2012 at 12:04 PM, Bartlett, Thomas < thomas.bartlett.10@ucl.ac.uk> wrote: > Hi, > > I've noticed a discrepancy between the chromosome information given for > some of the probes of the Illumina Infinium 450K array in the GEO GPL info, > and in the corresponding Bioconductor annotation package. > > The first four probes on the 450K GPL summary page on the GEO website > http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534 > in the 'data table' are cg00035864, cg00050873, cg00061679 and cg00063477, > and the corresponding value in the CHR column is Y for all four of these. > However, in the corresponding Bioconductor annotation package > IlluminaHumanMethylation450k.db, using IlluminaHumanMethylation450kCHR the > chromosome for these same 4 probes is given as Y, NONE, NONE and Y, > respectively. N.B., the values in the MAPINFO column of 'data table' and > those found using IlluminaHumanMethylation450kCPGCOORDINATE are identical > for these 4 probes. > > Is there any reason why there is this discrepancy, and might it be more > widespread? > > Thanks in advance for your help > > Tom Bartlett > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6