lumiHumanAll.db - wrong information in lumiHumanAllCHR for some probes?
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi, I'm working with lumiHumanAll.db and chromosomal locations using the CHR and CHRLOC tables. Mostly things turn out fine but I think I have found some probes for which the information in CHR and CHRLOC doesn't match up. (I'm not sure whether I found all the problem probes, or just those a few that were most obvious because they seemed to be off the end of the chromosome). I'd guess something to do with how probes mapping to multiple locations are dealt with, which is tricky, but it seems important to be internally consistent between CHR and CHRLOC. I've tried to explain everything with the code at the bottom of the email. thanks very much, Janet ------------------------------------------------------------------- Dr. Janet Young Tapscott and Malik labs Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- library(lumiHumanAll.db) library(lumi) library(annotate) ### these have mismatched CHR and CHRLOC info - I noticed them among a much larger set of probes odd_mappers <- c("cS._E8f0CEAHsPH.oU", "3B5Dx.5FBcAstHt9Iw", "Ho.7bwAyQBWQ8f_RQU", "0k9AKLpXv97vAFU.rk") ### and a few other probes that looked fine good_mappers <- c("Ku8QhfS0n_hIOABXuE", "fqPEquJRRlSVSfL.8A", "ckiehnugOno9d7vf1Q", "x57Vw5B5Fbt5JUnQkI") probes <- c(odd_mappers,good_mappers) probeType <- c( rep("odd",length(odd_mappers)), rep("good",length(good_mappers)) ) ### get their map info from CHR and CHRLOC chrs <- lookUp(probes, "lumiHumanAll.db", "CHR") locs <- lookUp(probes, "lumiHumanAll.db", "CHRLOC") ### some probes have two locs, which is OK, but make sure we know which information to double up when we make a table later numLocsPerProbe <- sapply(locs,length) #### put that info into a table mapping <- data.frame( probe=rep( probes, numLocsPerProbe), probeType=rep( probeType, numLocsPerProbe), chrLoc=abs(unlist(locs,use.names=FALSE)), #ignore strand chrsFromChrsList=rep(unlist(chrs,use.names=FALSE), numLocsPerProbe), chrsFromLocsList=unlist(lapply(locs, names),use.names=FALSE) ) #### looking at CHRLENGTH was how I realized some of the CHR info wasn't right - probe maps way after end of chromosome mapping[,"chrLengthChrsList"] <- org.Hs.egCHRLENGTHS[ as.character(mapping[,"chrsFromChrsList"]) ] mapping[,"chrLengthLocsList"] <- org.Hs.egCHRLENGTHS[ as.character(mapping[,"chrsFromLocsList"]) ] #### add probe sequences mapping[,"seq"] <- id2seq(as.character(mapping[,"probe"])) #### take a look at the table, and do some BLAT searches at UCSC website to see where the probe really maps mapping ### BLAT search results - these are the exact matches, but all have other non-exact matches) # first probe cS._E8f0CEAHsPH.oU maps to chr10:56367644-56367693 # second probe 3B5Dx.5FBcAstHt9Iw maps to chr17:13446846-13446895 # third probe Ho.7bwAyQBWQ8f_RQU maps to chr7:34980375-34980424 # fourth probe 0k9AKLpXv97vAFU.rk maps to chr3:149699708-149699757 ####### so in each of those cases it looks like lumiHumanAllCHR has the correct chromosome, and CHRLOC is wrong (perhaps it took one of the secondary, non-exact matches?). (so the locations on the correct chromosome are not available in any table?) ################# sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.32.1 lumi_2.6.0 nleqslv_1.9.1 [4] methylumi_2.0.1 lumiHumanAll.db_1.16.0 org.Hs.eg.db_2.6.4 [7] RSQLite_0.11.0 DBI_0.2-5 AnnotationDbi_1.16.10 [10] Biobase_2.14.0 loaded via a namespace (and not attached): [1] affy_1.32.0 affyio_1.22.0 BiocInstaller_1.2.1 [4] grid_2.14.0 hdrcde_2.15 IRanges_1.12.5 [7] KernSmooth_2.23-7 lattice_0.20-0 MASS_7.3-16 [10] Matrix_1.0-2 mgcv_1.7-11 nlme_3.1-102 [13] preprocessCore_1.16.0 xtable_1.6-0 zlibbioc_1.0.0
Cancer probe Cancer probe • 913 views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 12 weeks ago
United States
I encountered one of these events and raised the question internally with Pan Du, who gave the following reply: ------- Hi Vince I think this is an old question, which was raised before. The problem was caused by the library "org.Hs.eg.db". The Illumina probe basically maps to gene "653659". By checking "org.Hs.eg.db", we can get its annotation: lookUp('653659', 'org.Hs.eg.db', 'CHR') $`653659` [1] "3" > lookUp('653659', 'org.Hs.eg.db', 'CHRLOC') $`653659` 1 202976536 I guess the reason is due to the annotation of "org.Hs.eg.db" coming from different sources, one is from NCBI, and another one is from UCSC. I forgot which is from which. Marc should know better. --- And indeed if we check the documentation for org.Hs.eg.db we see that CHRLOC is UCSC-based and CHR is NCBI-based. At present it seems important to do some defensive programming if you are mixing these resources. On Thu, Dec 29, 2011 at 7:14 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi, > > I'm working with lumiHumanAll.db and chromosomal locations using the CHR > and CHRLOC tables. > > Mostly things turn out fine but I think I have found some probes for which > the information in CHR and CHRLOC doesn't match up. (I'm not sure whether > I found all the problem probes, or just those a few that were most obvious > because they seemed to be off the end of the chromosome). > > I'd guess something to do with how probes mapping to multiple locations > are dealt with, which is tricky, but it seems important to be internally > consistent between CHR and CHRLOC. > > I've tried to explain everything with the code at the bottom of the email. > > thanks very much, > > Janet > > ------------------------------------------------------------------- > > Dr. Janet Young > > Tapscott and Malik labs > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > > ------------------------------------------------------------------- > > > > > library(lumiHumanAll.db) > library(lumi) > library(annotate) > > ### these have mismatched CHR and CHRLOC info - I noticed them among a > much larger set of probes > odd_mappers <- c("cS._E8f0CEAHsPH.oU", "3B5Dx.5FBcAstHt9Iw", > "Ho.7bwAyQBWQ8f_RQU", "0k9AKLpXv97vAFU.rk") > > ### and a few other probes that looked fine > good_mappers <- c("Ku8QhfS0n_hIOABXuE", "fqPEquJRRlSVSfL.8A", > "ckiehnugOno9d7vf1Q", "x57Vw5B5Fbt5JUnQkI") > > probes <- c(odd_mappers,good_mappers) > probeType <- c( rep("odd",length(odd_mappers)), > rep("good",length(good_mappers)) ) > > ### get their map info from CHR and CHRLOC > chrs <- lookUp(probes, "lumiHumanAll.db", "CHR") > locs <- lookUp(probes, "lumiHumanAll.db", "CHRLOC") > > ### some probes have two locs, which is OK, but make sure we know which > information to double up when we make a table later > numLocsPerProbe <- sapply(locs,length) > > #### put that info into a table > mapping <- data.frame( probe=rep( probes, numLocsPerProbe), > probeType=rep( probeType, numLocsPerProbe), > chrLoc=abs(unlist(locs,use.names=FALSE)), #ignore strand > chrsFromChrsList=rep(unlist(chrs,use.names=FALSE), numLocsPerProbe), > chrsFromLocsList=unlist(lapply(locs, names),use.names=FALSE) ) > > #### looking at CHRLENGTH was how I realized some of the CHR info wasn't > right - probe maps way after end of chromosome > mapping[,"chrLengthChrsList"] <- org.Hs.egCHRLENGTHS[ > as.character(mapping[,"chrsFromChrsList"]) ] > mapping[,"chrLengthLocsList"] <- org.Hs.egCHRLENGTHS[ > as.character(mapping[,"chrsFromLocsList"]) ] > > #### add probe sequences > mapping[,"seq"] <- id2seq(as.character(mapping[,"probe"])) > > #### take a look at the table, and do some BLAT searches at UCSC website > to see where the probe really maps > mapping > > ### BLAT search results - these are the exact matches, but all have other > non-exact matches) > # first probe cS._E8f0CEAHsPH.oU maps to chr10:56367644-56367693 > # second probe 3B5Dx.5FBcAstHt9Iw maps to chr17:13446846-13446895 > # third probe Ho.7bwAyQBWQ8f_RQU maps to chr7:34980375-34980424 > # fourth probe 0k9AKLpXv97vAFU.rk maps to chr3:149699708-149699757 > ####### so in each of those cases it looks like lumiHumanAllCHR has the > correct chromosome, and CHRLOC is wrong (perhaps it took one of the > secondary, non-exact matches?). (so the locations on the correct > chromosome are not available in any table?) > > ################# > > > sessionInfo() > > R version 2.14.0 (2011-10-31) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] annotate_1.32.1 lumi_2.6.0 nleqslv_1.9.1 > [4] methylumi_2.0.1 lumiHumanAll.db_1.16.0 org.Hs.eg.db_2.6.4 > [7] RSQLite_0.11.0 DBI_0.2-5 AnnotationDbi_1.16.10 > [10] Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] affy_1.32.0 affyio_1.22.0 BiocInstaller_1.2.1 > [4] grid_2.14.0 hdrcde_2.15 IRanges_1.12.5 > [7] KernSmooth_2.23-7 lattice_0.20-0 MASS_7.3-16 > [10] Matrix_1.0-2 mgcv_1.7-11 nlme_3.1-102 > [13] preprocessCore_1.16.0 xtable_1.6-0 zlibbioc_1.0.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6