GOstats: Listing genes from hyperGTest

0

Entering edit mode

Tim Smith ★ 1.1k

@tim-smith-1532

Last seen 10.2 years ago

Thanks Jim. I'm still having problems, i.e., I cannot find which subset of input genes resulted in the significant GO term. I have reproduced the problem that I am having: ----------------------------------------------------- library(org.Hs.eg.db) library(GOstats) library(GO.db) # Set of genes (Entrez Ids) that I want to investigate geneIds <- c("10406", "10418", "11082", "1281", "1410", "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , "2877", "2993", "3039" , "3043", "3046", "3283", "3553", "4015", "4069", "4258", "4345", "4353", "4885", "4886", "5021", "5055", "5151", "5156", "5320", "5553", "55885", "56667", "5788" , "5999", "6005", "629", "6507", "653145", "6590", "673", "6876", "695", "7026", "7070", "7103", "7412", "760", "7738", "800", "828", "83890", "945", "963") ### I have reproduced Jim's code for the test univ <- Lkeys(org.Hs.egGO) param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") hyp <- hyperGTest(param) summary(hyp,categorySize=2000) ---------------------------------------------------- The result I get is : GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 multicellular organismal process 2 GO:0032502 0.0012459810 2.626265 10.297409 20 2956 developmental process 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 multicellular organismal development 4 GO:0007154 0.0061081457 2.187407 13.418681 22 3852 cell communication I now want to know which subset of genes resulted in 'GO:0032501'. The error I get is: > geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : value for "GO:0032501" not found > geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : value for "GO:0032502" not found [[elided Yahoo spam]] thanks a lot. Tim Hi Tim, Yeah, probeSetSummary() is probably not what you want, if you are not starting with an Affy chip. There are some gymnastics required to map things back to the original Affy chip that you won't need to do. In addition, if you are not using a conditional hypergeometric analysis, it should be pretty simple to get what you want without even needing to parse things out of the GOHyperGResult object. An example: ## fake up some data > geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] > univ <- Lkeys(org.Hs.egGO) > param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > hyp <- hyperGTest(param) > summary(hyp, categorySize=10) GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 56 fertilization So we have two terms of interest. Getting the Entrez Gene IDs from the input set that map to these terms is easy: > geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] [1] "100131137" "10007" Now you might also want to know which 54 Entrez Gene IDs map to that particular GO term. Since you are not conditioning, this includes that particular GO term and all its offspring. > offspring <- get("GO:0007338", GOBPOFFSPRING) > egids <- unique(unlist(mget(c("GO:0007338", offspring), revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) > egids[!is.na(egids)] [1] "1047" "4179" "4240" "4486" "4809" "5016" [7] "6674" "7783" "7784" "7802" "7993" "8747" [13] "8748" "8852" "9082" "10007" "10361" "22917" [19] "26476" "53340" "57055" "57829" "64100" "93185" [25] "158062" "442868" "100131137" "49" "410" "2683" [31] "3010" "4184" "6677" "7142" "7455" "8857" [37] "11055" "124626" "2054" "2741" "10343" "10566" [43] "27297" "152015" "3074" "167" "928" "2515" [49] "5104" "23553" "284359" "164684" "7141" "79400" Best, Jim Tim Smith wrote: Thanks James. If I can tweak that function, I'll get exactly what I want. I tried what you suggested and got the following error: --------------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) ps <- probeSetSummary(GO) Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable "org.Hs.egENTREZID" was not found -------------------------------- I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. Or am I doing something wrong? thanks! ----- Original Message ---- From: James W. MacDonald <jmacdon@med.umich.edu> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wednesday, October 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from hyperGTest Hi Tim, Does probeSetSummary() do what you want? Best, Jim Tim Smith wrote: Hi, I was performing a hyperGTest for genes in homo-sapiens. For a set of input genes, this function returns some 'significant' GO terms. What I wanted to now do was to co-relate each significant GO term (returned by this function) with genes (from my set of input genes) associated with that GO term. However, I think that I may be using the wrong package/function to get the releveant set of genes. Currently, what I'm doing is finding the significant GO terms by using the following code: ----------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) -------------------------- This gives me a set of significant GO terms. Now, I would like to find which subset of genes in 'genes1' is associated with each of the significant GO term. To do this I map all GO terms to their Entrez IDs using the 'org.Hs.eg.db' package using the following: xx <- as.list(org.Hs.egGO2EG) to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't this number small?) that map to at least one Entrez ID. So, from here I look up which Entrez IDs are associated with my GO term of interest. My problem is that often, the GO term from hyperGTest is not associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described above ), i.e. the GO term/ID is not in the list obtained from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by hyperGTest, but does not appear to be associated with any Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? [[elided Yahoo spam]] Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... My sessionInfo() is: -------------------------------- R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 --------------------------------- [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 [[alternative HTML version deleted]]

Annotation GO probe affy Annotation GO probe affy • 2.5k views

ADD COMMENT • link 16.1 years ago Tim Smith ★ 1.1k

0

Entering edit mode

Tim Smith ★ 1.1k

@tim-smith-1532

Last seen 10.2 years ago

Thanks Jim. That works..! Tim ________________________________ From: James W. MacDonald <jmacdon@med.umich.edu> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Friday, October 24, 2008 9:44:04 AM Subject: Re: [BioC] GOstats: Listing genes from hyperGTest Hi Tim, Hmm. That's weird. First off, I think you want to be using the GO2ALLEGS mapping rather than the GO2EG (or revmap(org.Hs.egGO) which is equivalent). The reason for this is that the GO ontology is a directed acyclic graph where all children are subsets of their parents, so by definition all genes that are appended with a child GO term are also appended with the parent term (e.g., multicellular organismal development is a part of the developmental process, so any genes involved with the former are by default involved with the latter as well). So modulo differences in our BioC versions, I get the same results as you: > summary(hyp, categorySize=2000) GOBPID Pvalue OddsRatio ExpCount Count Size 1 GO:0032501 0.0001909763 3.000224 11.303450 23 3420 2 GO:0032502 0.0008306266 2.730994 9.984714 20 3021 3 GO:0007275 0.0035077569 2.576107 7.224954 15 2186 4 GO:0007154 0.0042870538 2.271198 13.058459 22 3951 Term 1 multicellular organismal process 2 developmental process 3 multicellular organismal development 4 cell communication And can get the genes using org.Hs.egGO2ALLEGS: > geneIds[geneIds %in% get("GO:0032501", org.Hs.egGO2ALLEGS)] [1] "1281" "1410" "157506" "1906" "2625" "3043" "3553" "4015" [9] "4885" "4886" "5021" "5156" "5788" "6005" "6507" "673" [17] "6876" "695" "7026" "7070" "7412" "800" "83890" What I don't understand is how multicellular organismal process and developmental process can be missing from the org.Hs.egGO map Both these processes are direct children of biological process, so really they should be there. Maybe Marc Carlson can shed some light on this. Best, Jim Tim Smith wrote: > Thanks Jim. I'm still having problems, i.e., I cannot find which subset of input genes resulted in the significant GO term. I have reproduced the problem that I am having: > > ----------------------------------------------------- > library(org.Hs.eg.db) > library(GOstats) > library(GO.db) > > # Set of genes (Entrez Ids) that I want to investigate > geneIds <- c("10406", "10418", "11082", "1281", "1410", "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , "2877", "2993", "3039" , "3043", > "3046", "3283", "3553", "4015", "4069", "4258", "4345", "4353", "4885", "4886", "5021", "5055", "5151", "5156", "5320", "5553", > "55885", "56667", "5788" , "5999", "6005", "629", "6507", "653145", "6590", "673", "6876", "695", "7026", "7070", "7103", "7412", > "760", "7738", "800", "828", "83890", "945", "963") > > ### I have reproduced Jim's code for the test > univ <- Lkeys(org.Hs.egGO) > param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > hyp <- hyperGTest(param) > summary(hyp,categorySize=2000) > ---------------------------------------------------- > > The result I get is : > > GOBPID Pvalue OddsRatio ExpCount Count Size Term > 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 multicellular organismal process > 2 GO:0032502 0.0012459810 2.626265 10.297409 20 2956 developmental process > 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 multicellular organismal development > 4 GO:0007154 0.0061081457 2.187407 13.418681 22 3852 cell communication > > I now want to know which subset of genes resulted in 'GO:0032501'. The error I get is: > >> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] > Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : > value for "GO:0032501" not found > >> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] > Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : > value for "GO:0032502" not found > > [[elided Yahoo spam]] > > thanks a lot. > > Tim > > > > > > Hi Tim, > > Yeah, probeSetSummary() is probably not what you want, if you are not > starting with an Affy chip. There are some gymnastics required to map > things back to the original Affy chip that you won't need to do. In > addition, if you are not using a conditional hypergeometric analysis, it > should be pretty simple to get what you want without even needing to > parse things out of the GOHyperGResult object. An example: > > ## fake up some data > >> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] >> univ <- Lkeys(org.Hs.egGO) >> param <- new("GOHyperGParams", geneIds = geneIds, > universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >> hyp <- hyperGTest(param) >> summary(hyp, categorySize=10) > GOBPID Pvalue OddsRatio ExpCount Count Size > Term > 1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single > fertilization > 2 GO:0009566 0.002925855 28.16374 0.08097501 2 56 > fertilization > > So we have two terms of interest. Getting the Entrez Gene IDs from the > input set that map to these terms is easy: > >> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] > [1] "100131137" "10007" > > Now you might also want to know which 54 Entrez Gene IDs map to that > particular GO term. Since you are not conditioning, this includes that > particular GO term and all its offspring. > >> offspring <- get("GO:0007338", GOBPOFFSPRING) >> egids <- unique(unlist(mget(c("GO:0007338", offspring), > revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) >> egids[!is.na(egids)] > [1] "1047" "4179" "4240" "4486" "4809" "5016" > [7] "6674" "7783" "7784" "7802" "7993" "8747" > [13] "8748" "8852" "9082" "10007" "10361" "22917" > [19] "26476" "53340" "57055" "57829" "64100" "93185" > [25] "158062" "442868" "100131137" "49" "410" "2683" > [31] "3010" "4184" "6677" "7142" "7455" "8857" > [37] "11055" "124626" "2054" "2741" "10343" "10566" > [43] "27297" "152015" "3074" "167" "928" "2515" > [49] "5104" "23553" "284359" "164684" "7141" "79400" > > Best, > > Jim > > > Tim Smith wrote: > > Thanks James. If I can tweak that function, I'll get exactly what I want. > I tried what you suggested and got the following error: > > --------------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > ps <- probeSetSummary(GO) > > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > variable "org.Hs.egENTREZID" was not found > -------------------------------- > > I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. > > Or am I doing something wrong? > > thanks! > > > > > > ----- Original Message ---- > From: James W. MacDonald <jmacdon@med.umich.edu> > > Cc: bioc <bioconductor@stat.math.ethz.ch> > Sent: Wednesday, October 22, 2008 9:10:39 AM > Subject: Re: [BioC] GOstat: listing genes from hyperGTest > > Hi Tim, > > Does probeSetSummary() do what you want? > > Best, > > Jim > > > > Tim Smith wrote: > > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > > [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0 > AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 > genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22] > preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 [[alternative HTML version deleted]]

ADD COMMENT • link 16.1 years ago Tim Smith ★ 1.1k

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Tim, Hmm. That's weird. First off, I think you want to be using the GO2ALLEGS mapping rather than the GO2EG (or revmap(org.Hs.egGO) which is equivalent). The reason for this is that the GO ontology is a directed acyclic graph where all children are subsets of their parents, so by definition all genes that are appended with a child GO term are also appended with the parent term (e.g., multicellular organismal development is a part of the developmental process, so any genes involved with the former are by default involved with the latter as well). So modulo differences in our BioC versions, I get the same results as you: > summary(hyp, categorySize=2000) GOBPID Pvalue OddsRatio ExpCount Count Size 1 GO:0032501 0.0001909763 3.000224 11.303450 23 3420 2 GO:0032502 0.0008306266 2.730994 9.984714 20 3021 3 GO:0007275 0.0035077569 2.576107 7.224954 15 2186 4 GO:0007154 0.0042870538 2.271198 13.058459 22 3951 Term 1 multicellular organismal process 2 developmental process 3 multicellular organismal development 4 cell communication And can get the genes using org.Hs.egGO2ALLEGS: > geneIds[geneIds %in% get("GO:0032501", org.Hs.egGO2ALLEGS)] [1] "1281" "1410" "157506" "1906" "2625" "3043" "3553" "4015" [9] "4885" "4886" "5021" "5156" "5788" "6005" "6507" "673" [17] "6876" "695" "7026" "7070" "7412" "800" "83890" What I don't understand is how multicellular organismal process and developmental process can be missing from the org.Hs.egGO map Both these processes are direct children of biological process, so really they should be there. Maybe Marc Carlson can shed some light on this. Best, Jim Tim Smith wrote: > Thanks Jim. I'm still having problems, i.e., I cannot find which subset of input genes resulted in the significant GO term. I have reproduced the problem that I am having: > > ----------------------------------------------------- > library(org.Hs.eg.db) > library(GOstats) > library(GO.db) > > # Set of genes (Entrez Ids) that I want to investigate > geneIds <- c("10406", "10418", "11082", "1281", "1410", "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , "2877", "2993", "3039" , "3043", > "3046", "3283", "3553", "4015", "4069", "4258", "4345", "4353", "4885", "4886", "5021", "5055", "5151", "5156", "5320", "5553", > "55885", "56667", "5788" , "5999", "6005", "629", "6507", "653145", "6590", "673", "6876", "695", "7026", "7070", "7103", "7412", > "760", "7738", "800", "828", "83890", "945", "963") > > ### I have reproduced Jim's code for the test > univ <- Lkeys(org.Hs.egGO) > param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > hyp <- hyperGTest(param) > summary(hyp,categorySize=2000) > ---------------------------------------------------- > > The result I get is : > > GOBPID Pvalue OddsRatio ExpCount Count Size Term > 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 multicellular organismal process > 2 GO:0032502 0.0012459810 2.626265 10.297409 20 2956 developmental process > 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 multicellular organismal development > 4 GO:0007154 0.0061081457 2.187407 13.418681 22 3852 cell communication > > I now want to know which subset of genes resulted in 'GO:0032501'. The error I get is: > >> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0032501" not found > >> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0032502" not found > > [[elided Yahoo spam]] > > thanks a lot. > > Tim > > > > > > Hi Tim, > > Yeah, probeSetSummary() is probably not what you want, if you are not > starting with an Affy chip. There are some gymnastics required to map > things back to the original Affy chip that you won't need to do. In > addition, if you are not using a conditional hypergeometric analysis, it > should be pretty simple to get what you want without even needing to > parse things out of the GOHyperGResult object. An example: > > ## fake up some data > >> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] >> univ <- Lkeys(org.Hs.egGO) >> param <- new("GOHyperGParams", geneIds = geneIds, > universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >> hyp <- hyperGTest(param) >> summary(hyp, categorySize=10) > GOBPID Pvalue OddsRatio ExpCount Count Size > Term > 1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single > fertilization > 2 GO:0009566 0.002925855 28.16374 0.08097501 2 56 > fertilization > > So we have two terms of interest. Getting the Entrez Gene IDs from the > input set that map to these terms is easy: > >> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] > [1] "100131137" "10007" > > Now you might also want to know which 54 Entrez Gene IDs map to that > particular GO term. Since you are not conditioning, this includes that > particular GO term and all its offspring. > >> offspring <- get("GO:0007338", GOBPOFFSPRING) >> egids <- unique(unlist(mget(c("GO:0007338", offspring), > revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) >> egids[!is.na(egids)] > [1] "1047" "4179" "4240" "4486" "4809" "5016" > [7] "6674" "7783" "7784" "7802" "7993" "8747" > [13] "8748" "8852" "9082" "10007" "10361" "22917" > [19] "26476" "53340" "57055" "57829" "64100" "93185" > [25] "158062" "442868" "100131137" "49" "410" "2683" > [31] "3010" "4184" "6677" "7142" "7455" "8857" > [37] "11055" "124626" "2054" "2741" "10343" "10566" > [43] "27297" "152015" "3074" "167" "928" "2515" > [49] "5104" "23553" "284359" "164684" "7141" "79400" > > Best, > > Jim > > > Tim Smith wrote: > > Thanks James. If I can tweak that function, I'll get exactly what I want. > I tried what you suggested and got the following error: > > --------------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > ps <- probeSetSummary(GO) > > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > variable "org.Hs.egENTREZID" was not found > -------------------------------- > > I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. > > Or am I doing something wrong? > > thanks! > > > > > > ----- Original Message ---- > From: James W. MacDonald <jmacdon at="" med.umich.edu=""> > > Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> > Sent: Wednesday, October 22, 2008 9:10:39 AM > Subject: Re: [BioC] GOstat: listing genes from hyperGTest > > Hi Tim, > > Does probeSetSummary() do what you want? > > Best, > > Jim > > > > Tim Smith wrote: > > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > > [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0 > AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 > genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22] > preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 16.1 years ago James W. MacDonald 67k

0

Entering edit mode

Hi Jim, Thanks again for swooping in to get these guys answer while those of us on the west coast are still sleeping. We really do appreciate it. To answer your question, the mapping org.Hs.egGO does not contain all relationships between all GO terms and all genes. What it is supposed to contain are the terminal mappings (ie. the most specific "leaves" from the GO DAG) that can be mapped onto human genes. This is the information as it was originally input by curators and stored at NCBI. >From that we can derive that if a terminal (ie. more specific term) is related to a gene, that its less specific parent terms should also be mapped to it, and this information is stored in the org.Hs.egGO2ALLEGS mapping. So as you pointed out you can find these results by doing: get("GO:0032502", org.Hs.egGO2ALLEGS) #and get("GO:0032501", org.Hs.egGO2ALLEGS) I can't tell you why nobody annotating genes in human ever specifies these two terms, I suspect it is because being as general as they are they are not very meaningful. So I am kind of glad that I don't see them used, because it tells me that the annotations are more specific than these two terms. And more specific annotations should be more useful than vague generalized ones. I guess it is also possible that NCBI could do some filtering on their term mappings before they share them with the world, but I don't think so because they do (on rare occasion) have genes from other organisms that are mapped to both of these terms. As for the complete collection of GO DAG terms, the entirety of that information should be found only in the GO.db package since not every piece of GO should necessarily be represented by a gene in humans that is known to be involved in that process/component/function. And when I check, I do see these two terms there so that part is also ok. So to clarify, the GO information in org.Hs.eg.db is only meant to indicate how specific genes map onto the GO ontology, which is very unlikely to correspond to all the GO terms being represented, in contrast the structure of GO and the complete listing of GO terms is found in GO.db. If this is all still confusing, please let me know. The GO stuff is confusing. Marc James W. MacDonald wrote: > Hi Tim, > > Hmm. That's weird. > > First off, I think you want to be using the GO2ALLEGS mapping rather > than the GO2EG (or revmap(org.Hs.egGO) which is equivalent). > > The reason for this is that the GO ontology is a directed acyclic > graph where all children are subsets of their parents, so by > definition all genes that are appended with a child GO term are also > appended with the parent term (e.g., multicellular organismal > development is a part of the developmental process, so any genes > involved with the former are by default involved with the latter as > well). > > So modulo differences in our BioC versions, I get the same results as > you: > > > summary(hyp, categorySize=2000) > GOBPID Pvalue OddsRatio ExpCount Count Size > 1 GO:0032501 0.0001909763 3.000224 11.303450 23 3420 > 2 GO:0032502 0.0008306266 2.730994 9.984714 20 3021 > 3 GO:0007275 0.0035077569 2.576107 7.224954 15 2186 > 4 GO:0007154 0.0042870538 2.271198 13.058459 22 3951 > Term > 1 multicellular organismal process > 2 developmental process > 3 multicellular organismal development > 4 cell communication > > And can get the genes using org.Hs.egGO2ALLEGS: > > > geneIds[geneIds %in% get("GO:0032501", org.Hs.egGO2ALLEGS)] > [1] "1281" "1410" "157506" "1906" "2625" "3043" "3553" "4015" > [9] "4885" "4886" "5021" "5156" "5788" "6005" "6507" "673" > [17] "6876" "695" "7026" "7070" "7412" "800" "83890" > > > What I don't understand is how multicellular organismal process and > developmental process can be missing from the org.Hs.egGO map > > Both these processes are direct children of biological process, so > really they should be there. Maybe Marc Carlson can shed some light on > this. > > > Best, > > Jim > > > > Tim Smith wrote: >> Thanks Jim. I'm still having problems, i.e., I cannot find which >> subset of input genes resulted in the significant GO term. I have >> reproduced the problem that I am having: >> >> ----------------------------------------------------- >> library(org.Hs.eg.db) >> library(GOstats) >> library(GO.db) >> >> # Set of genes (Entrez Ids) that I want to investigate >> geneIds <- c("10406", "10418", "11082", "1281", "1410", >> "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , >> "2877", "2993", "3039" , "3043", "3046", "3283", "3553", >> "4015", "4069", "4258", "4345", "4353", "4885", "4886", >> "5021", "5055", "5151", "5156", "5320", "5553", "55885", >> "56667", "5788" , "5999", "6005", "629", "6507", "653145", >> "6590", "673", "6876", "695", "7026", "7070", "7103", >> "7412", "760", "7738", "800", "828", "83890", "945", >> "963") >> ### I have reproduced Jim's code for the test >> univ <- Lkeys(org.Hs.egGO) >> param <- new("GOHyperGParams", geneIds = geneIds, >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >> hyp <- hyperGTest(param) >> summary(hyp,categorySize=2000) >> ---------------------------------------------------- >> >> The result I get is : >> >> GOBPID Pvalue OddsRatio ExpCount Count >> Size Term >> 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 >> multicellular organismal process >> 2 GO:0032502 0.0012459810 2.626265 10.297409 20 >> 2956 developmental process >> 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 >> multicellular organismal development >> 4 GO:0007154 0.0061081457 2.187407 13.418681 22 >> 3852 cell communication >> >> I now want to know which subset of genes resulted in 'GO:0032501'. >> The error I get is: >> >>> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for >> "GO:0032501" not found >> >>> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for >> "GO:0032502" not found >> >> [[elided Yahoo spam]] >> >> thanks a lot. >> >> Tim >> >> >> >> >> >> Hi Tim, >> Yeah, probeSetSummary() is probably not what you want, if you are not >> starting with an Affy chip. There are some gymnastics required to map >> things back to the original Affy chip that you won't need to do. In >> addition, if you are not using a conditional hypergeometric analysis, >> it should be pretty simple to get what you want without even needing >> to parse things out of the GOHyperGResult object. An example: >> ## fake up some data >>> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] univ <- >>> Lkeys(org.Hs.egGO) param <- new("GOHyperGParams", geneIds = geneIds, >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >>> hyp <- hyperGTest(param) summary(hyp, categorySize=10) >> GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 >> GO:0007338 0.002723500 29.25101 0.07808304 2 54 single >> fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 >> 56 fertilization >> So we have two terms of interest. Getting the Entrez Gene IDs from >> the input set that map to these terms is easy: >>> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] >> [1] "100131137" "10007" >> Now you might also want to know which 54 Entrez Gene IDs map to that >> particular GO term. Since you are not conditioning, this includes >> that particular GO term and all its offspring. >>> offspring <- get("GO:0007338", GOBPOFFSPRING) egids <- >>> unique(unlist(mget(c("GO:0007338", offspring), >> revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) >>> egids[!is.na(egids)] >> [1] "1047" "4179" "4240" "4486" "4809" >> "5016" [7] "6674" "7783" "7784" "7802" >> "7993" "8747" [13] "8748" "8852" "9082" >> "10007" "10361" "22917" [19] "26476" "53340" >> "57055" "57829" "64100" "93185" [25] "158062" >> "442868" "100131137" "49" "410" "2683" [31] >> "3010" "4184" "6677" "7142" "7455" "8857" >> [37] "11055" "124626" "2054" "2741" "10343" >> "10566" [43] "27297" "152015" "3074" "167" >> "928" "2515" [49] "5104" "23553" "284359" >> "164684" "7141" "79400" >> Best, >> Jim >> >> Tim Smith wrote: >> Thanks James. If I can tweak that function, I'll get exactly what I >> want. I tried what you suggested and got the following error: >> --------------------------- ### 'genes1' are the Entrez IDs of my >> genes of interest, and 'allGenes' is the universe of Entrez IDs >> paramsGO <- new("GOHyperGParams", geneIds = genes1, >> universeGeneIds = allGenes, annotation = "org.Hs.eg.db", >> ontology = "BP", pvalueCutoff = 1, conditional = FALSE, >> testDirection = "over") GO <- hyperGTest(paramsGO) ps <- >> probeSetSummary(GO) >> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable >> "org.Hs.egENTREZID" was not found -------------------------------- >> I guess the function would return the probe ids if I was using them, >> but I have Entrez IDs as input. >> Or am I doing something wrong? >> thanks! >> >> >> >> >> ----- Original Message ---- From: James W. MacDonald >> <jmacdon at="" med.umich.edu=""> >> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Sent: Wednesday, October >> 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from >> hyperGTest >> Hi Tim, >> Does probeSetSummary() do what you want? >> Best, >> Jim >> >> >> Tim Smith wrote: >> Hi, >> I was performing a hyperGTest for genes in homo-sapiens. For a set of >> input genes, this function returns some 'significant' GO terms. What >> I wanted to now do was to co-relate each significant GO term >> (returned by this function) with genes (from my set of input genes) >> associated with that GO term. However, I think that I may be using >> the wrong package/function to get the releveant set of genes. >> Currently, what I'm doing is finding the significant GO terms by >> using the following code: >> ----------------------- ### 'genes1' are the Entrez IDs of my genes >> of interest, and 'allGenes' is the universe of Entrez IDs paramsGO >> <- new("GOHyperGParams", geneIds = genes1, universeGeneIds >> = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", >> pvalueCutoff = 1, conditional = FALSE, testDirection = "over") >> GO <- hyperGTest(paramsGO) -------------------------- This gives me a >> set of significant GO terms. Now, I would like to find which subset >> of genes in 'genes1' is associated with each of the significant GO >> term. To do this I map all GO terms to their Entrez IDs using the >> 'org.Hs.eg.db' package using the following: >> xx <- as.list(org.Hs.egGO2EG) >> to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms >> (isn't this number small?) that map to at least one Entrez ID. So, >> from here I look up which Entrez IDs are associated with my GO term >> of interest. >> My problem is that often, the GO term from hyperGTest is not >> associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) >> described above ), i.e. the GO term/ID is not in the list obtained >> from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown >> up by hyperGTest, but does not appear to be associated with any >> Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? >> [[elided Yahoo spam]] >> Thanks for any comments/suggestions. I realize that I'm probably >> doing something really stupid here.... >> My sessionInfo() is: -------------------------------- R version 2.7.2 >> (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United >> States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> attached base packages: [1] grid splines tools stats >> graphics grDevices utils datasets methods base other >> attached packages: [1] gplots_2.6.0 gmodels_2.14.1 >> gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 >> GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 >> annotate_1.18.0 xtable_1.5-2 graph_1.18.0 >> PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] >> org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 >> DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 >> affy_1.18.0 [22] >> preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded >> via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 >> --------------------------------- >> >> [[alternative HTML version deleted]] >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 16.1 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Tim Smith ★ 1.1k

@tim-smith-1532

Last seen 10.2 years ago

Thanks for the reply Marc. That clears things up. Just one more question - I was trying to do a similar thing with PFAM terms, but couldn't find the appropriate function in org.Hs.eg.db. For example, if I wanted to know the entrez ids associated with 'PF07654' (I got it from a hyperGTest result) and tried: > get("PF07654",revmap(org.Hs.egPFAM)) Error in `direction<-`(`*tmp*`, value = -1L) : changing the direction of an "IpiAnnDbMap" object is not supported Error in get("PF07654", revmap(org.Hs.egPFAM)) : error in evaluating the argument 'pos' in selecting a method for function 'get' thanks again! ________________________________ From: Marc Carlson <mcarlson@fhcrc.org> To: James W. MacDonald <jmacdon@med.umich.edu> Sent: Friday, October 24, 2008 12:52:43 PM Subject: Re: [BioC] GOstats: Listing genes from hyperGTest Hi Jim, Thanks again for swooping in to get these guys answer while those of us on the west coast are still sleeping. We really do appreciate it. To answer your question, the mapping org.Hs.egGO does not contain all relationships between all GO terms and all genes. What it is supposed to contain are the terminal mappings (ie. the most specific "leaves" from the GO DAG) that can be mapped onto human genes. This is the information as it was originally input by curators and stored at NCBI. >From that we can derive that if a terminal (ie. more specific term) is related to a gene, that its less specific parent terms should also be mapped to it, and this information is stored in the org.Hs.egGO2ALLEGS mapping. So as you pointed out you can find these results by doing: get("GO:0032502", org.Hs.egGO2ALLEGS) #and get("GO:0032501", org.Hs.egGO2ALLEGS) I can't tell you why nobody annotating genes in human ever specifies these two terms, I suspect it is because being as general as they are they are not very meaningful. So I am kind of glad that I don't see them used, because it tells me that the annotations are more specific than these two terms. And more specific annotations should be more useful than vague generalized ones. I guess it is also possible that NCBI could do some filtering on their term mappings before they share them with the world, but I don't think so because they do (on rare occasion) have genes from other organisms that are mapped to both of these terms. As for the complete collection of GO DAG terms, the entirety of that information should be found only in the GO.db package since not every piece of GO should necessarily be represented by a gene in humans that is known to be involved in that process/component/function. And when I check, I do see these two terms there so that part is also ok. So to clarify, the GO information in org.Hs.eg.db is only meant to indicate how specific genes map onto the GO ontology, which is very unlikely to correspond to all the GO terms being represented, in contrast the structure of GO and the complete listing of GO terms is found in GO.db. If this is all still confusing, please let me know. The GO stuff is confusing. Marc James W. MacDonald wrote: > Hi Tim, > > Hmm. That's weird. > > First off, I think you want to be using the GO2ALLEGS mapping rather > than the GO2EG (or revmap(org.Hs.egGO) which is equivalent). > > The reason for this is that the GO ontology is a directed acyclic > graph where all children are subsets of their parents, so by > definition all genes that are appended with a child GO term are also > appended with the parent term (e.g., multicellular organismal > development is a part of the developmental process, so any genes > involved with the former are by default involved with the latter as > well). > > So modulo differences in our BioC versions, I get the same results as > you: > > > summary(hyp, categorySize=2000) > GOBPID Pvalue OddsRatio ExpCount Count Size > 1 GO:0032501 0.0001909763 3.000224 11.303450 23 3420 > 2 GO:0032502 0.0008306266 2.730994 9.984714 20 3021 > 3 GO:0007275 0.0035077569 2.576107 7.224954 15 2186 > 4 GO:0007154 0.0042870538 2.271198 13.058459 22 3951 > Term > 1 multicellular organismal process > 2 developmental process > 3 multicellular organismal development > 4 cell communication > > And can get the genes using org.Hs.egGO2ALLEGS: > > > geneIds[geneIds %in% get("GO:0032501", org.Hs.egGO2ALLEGS)] > [1] "1281" "1410" "157506" "1906" "2625" "3043" "3553" "4015" > [9] "4885" "4886" "5021" "5156" "5788" "6005" "6507" "673" > [17] "6876" "695" "7026" "7070" "7412" "800" "83890" > > > What I don't understand is how multicellular organismal process and > developmental process can be missing from the org.Hs.egGO map > > Both these processes are direct children of biological process, so > really they should be there. Maybe Marc Carlson can shed some light on > this. > > > Best, > > Jim > > > > Tim Smith wrote: >> Thanks Jim. I'm still having problems, i.e., I cannot find which >> subset of input genes resulted in the significant GO term. I have >> reproduced the problem that I am having: >> >> ----------------------------------------------------- >> library(org.Hs.eg.db) >> library(GOstats) >> library(GO.db) >> >> # Set of genes (Entrez Ids) that I want to investigate >> geneIds <- c("10406", "10418", "11082", "1281", "1410", >> "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , >> "2877", "2993", "3039" , "3043", "3046", "3283", "3553", >> "4015", "4069", "4258", "4345", "4353", "4885", "4886", >> "5021", "5055", "5151", "5156", "5320", "5553", "55885", >> "56667", "5788" , "5999", "6005", "629", "6507", "653145", >> "6590", "673", "6876", "695", "7026", "7070", "7103", >> "7412", "760", "7738", "800", "828", "83890", "945", >> "963") >> ### I have reproduced Jim's code for the test >> univ <- Lkeys(org.Hs.egGO) >> param <- new("GOHyperGParams", geneIds = geneIds, >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >> hyp <- hyperGTest(param) >> summary(hyp,categorySize=2000) >> ---------------------------------------------------- >> >> The result I get is : >> >> GOBPID Pvalue OddsRatio ExpCount Count >> Size Term >> 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 >> multicellular organismal process >> 2 GO:0032502 0.0012459810 2.626265 10.297409 20 >> 2956 developmental process >> 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 >> multicellular organismal development >> 4 GO:0007154 0.0061081457 2.187407 13.418681 22 >> 3852 cell communication >> >> I now want to know which subset of genes resulted in 'GO:0032501'. >> The error I get is: >> >>> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] >> Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : value for >> "GO:0032501" not found >> >>> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] >> Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : value for >> "GO:0032502" not found >> >> [[elided Yahoo spam]] >> >> thanks a lot. >> >> Tim >> >> >> >> >> >> Hi Tim, >> Yeah, probeSetSummary() is probably not what you want, if you are not >> starting with an Affy chip. There are some gymnastics required to map >> things back to the original Affy chip that you won't need to do. In >> addition, if you are not using a conditional hypergeometric analysis, >> it should be pretty simple to get what you want without even needing >> to parse things out of the GOHyperGResult object. An example: >> ## fake up some data >>> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] univ <- >>> Lkeys(org.Hs.egGO) param <- new("GOHyperGParams", geneIds = geneIds, >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") >>> hyp <- hyperGTest(param) summary(hyp, categorySize=10) >> GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 >> GO:0007338 0.002723500 29.25101 0.07808304 2 54 single >> fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 >> 56 fertilization >> So we have two terms of interest. Getting the Entrez Gene IDs from >> the input set that map to these terms is easy: >>> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] >> [1] "100131137" "10007" >> Now you might also want to know which 54 Entrez Gene IDs map to that >> particular GO term. Since you are not conditioning, this includes >> that particular GO term and all its offspring. >>> offspring <- get("GO:0007338", GOBPOFFSPRING) egids <- >>> unique(unlist(mget(c("GO:0007338", offspring), >> revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) >>> egids[!is.na(egids)] >> [1] "1047" "4179" "4240" "4486" "4809" >> "5016" [7] "6674" "7783" "7784" "7802" >> "7993" "8747" [13] "8748" "8852" "9082" >> "10007" "10361" "22917" [19] "26476" "53340" >> "57055" "57829" "64100" "93185" [25] "158062" >> "442868" "100131137" "49" "410" "2683" [31] >> "3010" "4184" "6677" "7142" "7455" "8857" >> [37] "11055" "124626" "2054" "2741" "10343" >> "10566" [43] "27297" "152015" "3074" "167" >> "928" "2515" [49] "5104" "23553" "284359" >> "164684" "7141" "79400" >> Best, >> Jim >> >> Tim Smith wrote: >> Thanks James. If I can tweak that function, I'll get exactly what I >> want. I tried what you suggested and got the following error: >> --------------------------- ### 'genes1' are the Entrez IDs of my >> genes of interest, and 'allGenes' is the universe of Entrez IDs >> paramsGO <- new("GOHyperGParams", geneIds = genes1, >> universeGeneIds = allGenes, annotation = "org.Hs.eg.db", >> ontology = "BP", pvalueCutoff = 1, conditional = FALSE, >> testDirection = "over") GO <- hyperGTest(paramsGO) ps <- >> probeSetSummary(GO) >> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable >> "org.Hs.egENTREZID" was not found -------------------------------- >> I guess the function would return the probe ids if I was using them, >> but I have Entrez IDs as input. >> Or am I doing something wrong? >> thanks! >> >> >> >> >> ----- Original Message ---- From: James W. MacDonald >> <jmacdon@med.umich.edu> >> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wednesday, October >> 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from >> hyperGTest >> Hi Tim, >> Does probeSetSummary() do what you want? >> Best, >> Jim >> >> >> Tim Smith wrote: >> Hi, >> I was performing a hyperGTest for genes in homo-sapiens. For a set of >> input genes, this function returns some 'significant' GO terms. What >> I wanted to now do was to co-relate each significant GO term >> (returned by this function) with genes (from my set of input genes) >> associated with that GO term. However, I think that I may be using >> the wrong package/function to get the releveant set of genes. >> Currently, what I'm doing is finding the significant GO terms by >> using the following code: >> ----------------------- ### 'genes1' are the Entrez IDs of my genes >> of interest, and 'allGenes' is the universe of Entrez IDs paramsGO >> <- new("GOHyperGParams", geneIds = genes1, universeGeneIds >> = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", >> pvalueCutoff = 1, conditional = FALSE, testDirection = "over") >> GO <- hyperGTest(paramsGO) -------------------------- This gives me a >> set of significant GO terms. Now, I would like to find which subset >> of genes in 'genes1' is associated with each of the significant GO >> term. To do this I map all GO terms to their Entrez IDs using the >> 'org.Hs.eg.db' package using the following: >> xx <- as.list(org.Hs.egGO2EG) >> to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms >> (isn't this number small?) that map to at least one Entrez ID. So, >> from here I look up which Entrez IDs are associated with my GO term >> of interest. >> My problem is that often, the GO term from hyperGTest is not >> associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) >> described above ), i.e. the GO term/ID is not in the list obtained >> from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown >> up by hyperGTest, but does not appear to be associated with any >> Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? >> [[elided Yahoo spam]] >> Thanks for any comments/suggestions. I realize that I'm probably >> doing something really stupid here.... >> My sessionInfo() is: -------------------------------- R version 2.7.2 >> (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United >> States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> attached base packages: [1] grid splines tools stats >> graphics grDevices utils datasets methods base other >> attached packages: [1] gplots_2.6.0 gmodels_2.14.1 >> gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 >> GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 >> annotate_1.18.0 xtable_1.5-2 graph_1.18.0 >> PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] >> org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 >> DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 >> affy_1.18.0 [22] >> preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded >> via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 >> --------------------------------- >> >> [[alternative HTML version deleted]] >> _______________________________________________ Bioconductor mailing >> list Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]

ADD COMMENT • link 16.1 years ago Tim Smith ★ 1.1k

0

Entering edit mode

Hi Tim, That is a harder case, because that map contains more than one kind of data and is therefore not really reversible, and since it never came up before, there is no reverse map present. I could add one but then you would have to wait (and waiting is NOT fun). So I think you might want to do this one using some more technical methods. #You can compose a SQL statement that can get what you want from the DB: #SELECT gene_id,pfam_id FROM genes, pfam where genes._id = pfam._id and pfam_id NOT NULL; #So now all we have to do is to use that to get the data out: library(org.Hs.eg.db) data = dbGetQuery(org.Hs.eg_dbconn(), "SELECT gene_id,pfam_id FROM genes, pfam where genes._id = pfam._id and pfam_id NOT NULL;") #Then we can take the massive redundancy out of that list. (which was there because the table had other data that we are now ignoring): cleanData = unique(data) #inspect the data to make sure things look ok: head(cleanData, n=40) #And then get what you want out of it etc.: cleanData[cleanData[,2]=="PF07654",] Hope this helps, Marc Tim Smith wrote: > Thanks for the reply Marc. That clears things up. Just one more > question - I was trying to do a similar thing with PFAM terms, but > couldn't find the appropriate function in org.Hs.eg.db. For example, > if I wanted to know the entrez ids associated with 'PF07654' (I got it > from a hyperGTest result) and tried: > > > get("PF07654",revmap(org.Hs.egPFAM)) > Error in `direction<-`(`*tmp*`, value = -1L) : > changing the direction of an "IpiAnnDbMap" object is not supported > Error in get("PF07654", revmap(org.Hs.egPFAM)) : > error in evaluating the argument 'pos' in selecting a method for > function 'get' > > > thanks again! > > -------------------------------------------------------------------- ---- > *From:* Marc Carlson <mcarlson at="" fhcrc.org=""> > *To:* James W. MacDonald <jmacdon at="" med.umich.edu=""> > *Cc:* Tim Smith <tim_smith_666 at="" yahoo.com="">; bioc > <bioconductor at="" stat.math.ethz.ch=""> > *Sent:* Friday, October 24, 2008 12:52:43 PM > *Subject:* Re: [BioC] GOstats: Listing genes from hyperGTest > > Hi Jim, > > Thanks again for swooping in to get these guys answer while those of us > on the west coast are still sleeping. We really do appreciate it. To > answer your question, the mapping org.Hs.egGO does not contain all > relationships between all GO terms and all genes. What it is supposed > to contain are the terminal mappings (ie. the most specific "leaves" > from the GO DAG) that can be mapped onto human genes. This is the > information as it was originally input by curators and stored at NCBI. > From that we can derive that if a terminal (ie. more specific term) is > related to a gene, that its less specific parent terms should also be > mapped to it, and this information is stored in the org.Hs.egGO2ALLEGS > mapping. So as you pointed out you can find these results by doing: > > get("GO:0032502", org.Hs.egGO2ALLEGS) > #and > get("GO:0032501", org.Hs.egGO2ALLEGS) > > I can't tell you why nobody annotating genes in human ever specifies > these two terms, I suspect it is because being as general as they are > they are not very meaningful. So I am kind of glad that I don't see > them used, because it tells me that the annotations are more specific > than these two terms. And more specific annotations should be more > useful than vague generalized ones. I guess it is also possible that > NCBI could do some filtering on their term mappings before they share > them with the world, but I don't think so because they do (on rare > occasion) have genes from other organisms that are mapped to both of > these terms. > > As for the complete collection of GO DAG terms, the entirety of that > information should be found only in the GO.db package since not every > piece of GO should necessarily be represented by a gene in humans that > is known to be involved in that process/component/function. And when I > check, I do see these two terms there so that part is also ok. So to > clarify, the GO information in org.Hs.eg <http: org.hs.eg="">.db is only > meant to indicate > how specific genes map onto the GO ontology, which is very unlikely to > correspond to all the GO terms being represented, in contrast the > structure of GO and the complete listing of GO terms is found in GO.db. > > If this is all still confusing, please let me know. The GO stuff is > confusing. > > > Marc > > > James W. MacDonald wrote: > > Hi Tim, > > > > Hmm. That's weird. > > > > First off, I think you want to be using the GO2ALLEGS mapping rather > > than the GO2EG (or revmap(org.Hs.egGO) which is equivalent). > > > > The reason for this is that the GO ontology is a directed acyclic > > graph where all children are subsets of their parents, so by > > definition all genes that are appended with a child GO term are also > > appended with the parent term (e.g., multicellular organismal > > development is a part of the developmental process, so any genes > > involved with the former are by default involved with the latter as > > well). > > > > So modulo differences in our BioC versions, I get the same results as > > you: > > > > > summary(hyp, categorySize=2000) > > GOBPID Pvalue OddsRatio ExpCount Count Size > > 1 GO:0032501 0.0001909763 3.000224 11.303450 23 3420 > > 2 GO:0032502 0.0008306266 2.730994 9.984714 20 3021 > > 3 GO:0007275 0.0035077569 2.576107 7.224954 15 2186 > > 4 GO:0007154 0.0042870538 2.271198 13.058459 22 3951 > > Term > > 1 multicellular organismal process > > 2 developmental process > > 3 multicellular organismal development > > 4 cell communication > > > > And can get the genes using org.Hs.egGO2ALLEGS: > > > > > geneIds[geneIds %in% get("GO:0032501", org.Hs.egGO2ALLEGS)] > > [1] "1281" "1410" "157506" "1906" "2625" "3043" "3553" "4015" > > [9] "4885" "4886" "5021" "5156" "5788" "6005" "6507" "673" > > [17] "6876" "695" "7026" "7070" "7412" "800" "83890" > > > > > > What I don't understand is how multicellular organismal process and > > developmental process can be missing from the org.Hs.egGO map > > > > Both these processes are direct children of biological process, so > > really they should be there. Maybe Marc Carlson can shed some light on > > this. > > > > > > Best, > > > > Jim > > > > > > > > Tim Smith wrote: > >> Thanks Jim. I'm still having problems, i.e., I cannot find which > >> subset of input genes resulted in the significant GO term. I have > >> reproduced the problem that I am having: > >> > >> ----------------------------------------------------- > >> library(org.Hs.eg.db) > >> library(GOstats) > >> library(GO.db) > >> > >> # Set of genes (Entrez Ids) that I want to investigate > >> geneIds <- c("10406", "10418", "11082", "1281", "1410", > >> "157506", "167410", "1906" , "2029", "23091", "2625" , "2823" , > >> "2877", "2993", "3039" , "3043", "3046", "3283", "3553", > >> "4015", "4069", "4258", "4345", "4353", "4885", "4886", > >> "5021", "5055", "5151", "5156", "5320", "5553", "55885", > >> "56667", "5788" , "5999", "6005", "629", "6507", "653145", > >> "6590", "673", "6876", "695", "7026", "7070", "7103", > >> "7412", "760", "7738", "800", "828", "83890", "945", > >> "963") > >> ### I have reproduced Jim's code for the test > >> univ <- Lkeys(org.Hs.egGO) > >> param <- new("GOHyperGParams", geneIds = geneIds, > >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > >> hyp <- hyperGTest(param) > >> summary(hyp,categorySize=2000) > >> ---------------------------------------------------- > >> > >> The result I get is : > >> > >> GOBPID Pvalue OddsRatio ExpCount Count > >> Size Term > >> 1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438 > >> multicellular organismal process > >> 2 GO:0032502 0.0012459810 2.626265 10.297409 20 > >> 2956 developmental process > >> 3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 > >> multicellular organismal development > >> 4 GO:0007154 0.0061081457 2.187407 13.418681 22 > >> 3852 cell communication > >> > >> I now want to know which subset of genes resulted in 'GO:0032501'. > >> The error I get is: > >> > >>> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))] > >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for > >> "GO:0032501" not found > >> > >>> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))] > >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for > >> "GO:0032502" not found > >> > >> [[elided Yahoo spam]] > >> > >> thanks a lot. > >> > >> Tim > >> > >> > >> > >> > >> > >> Hi Tim, > >> Yeah, probeSetSummary() is probably not what you want, if you are not > >> starting with an Affy chip. There are some gymnastics required to map > >> things back to the original Affy chip that you won't need to do. In > >> addition, if you are not using a conditional hypergeometric analysis, > >> it should be pretty simple to get what you want without even needing > >> to parse things out of the GOHyperGResult object. An example: > >> ## fake up some data > >>> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] univ <- > >>> Lkeys(org.Hs.egGO) param <- new("GOHyperGParams", geneIds = geneIds, > >> universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > >>> hyp <- hyperGTest(param) summary(hyp, categorySize=10) > >> GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 > >> GO:0007338 0.002723500 29.25101 0.07808304 2 54 single > >> fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 > >> 56 fertilization > >> So we have two terms of interest. Getting the Entrez Gene IDs from > >> the input set that map to these terms is easy: > >>> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] > >> [1] "100131137" "10007" > >> Now you might also want to know which 54 Entrez Gene IDs map to that > >> particular GO term. Since you are not conditioning, this includes > >> that particular GO term and all its offspring. > >>> offspring <- get("GO:0007338", GOBPOFFSPRING) egids <- > >>> unique(unlist(mget(c("GO:0007338", offspring), > >> revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) > >>> egids[!is.na <http: is.na="">(egids)] > >> [1] "1047" "4179" "4240" "4486" "4809" > >> "5016" [7] "6674" "7783" "7784" "7802" > >> "7993" "8747" [13] "8748" "8852" "9082" > >> "10007" "10361" "22917" [19] "26476" "53340" > >> "57055" "57829" "64100" "93185" [25] "158062" > >> "442868" "100131137" "49" "410" "2683" [31] > >> "3010" "4184" "6677" "7142" "7455" "8857" > >> [37] "11055" "124626" "2054" "2741" "10343" > >> "10566" [43] "27297" "152015" "3074" "167" > >> "928" "2515" [49] "5104" "23553" "284359" > >> "164684" "7141" "79400" > >> Best, > >> Jim > >> > >> Tim Smith wrote: > >> Thanks James. If I can tweak that function, I'll get exactly what I > >> want. I tried what you suggested and got the following error: > >> --------------------------- ### 'genes1' are the Entrez IDs of my > >> genes of interest, and 'allGenes' is the universe of Entrez IDs > >> paramsGO <- new("GOHyperGParams", geneIds = genes1, > >> universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > >> ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > >> testDirection = "over") GO <- hyperGTest(paramsGO) ps <- > >> probeSetSummary(GO) > >> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable > >> "org.Hs.egENTREZID" was not found -------------------------------- > >> I guess the function would return the probe ids if I was using them, > >> but I have Entrez IDs as input. > >> Or am I doing something wrong? > >> thanks! > >> > >> > >> > >> > >> ----- Original Message ---- From: James W. MacDonald > >> <jmacdon at="" med.umich.edu="" <mailto:jmacdon="" at="" med.umich.edu="">> > >> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">> Sent: Wednesday, October > >> 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from > >> hyperGTest > >> Hi Tim, > >> Does probeSetSummary() do what you want? > >> Best, > >> Jim > >> > >> > >> Tim Smith wrote: > >> Hi, > >> I was performing a hyperGTest for genes in homo-sapiens. For a set of > >> input genes, this function returns some 'significant' GO terms. What > >> I wanted to now do was to co-relate each significant GO term > >> (returned by this function) with genes (from my set of input genes) > >> associated with that GO term. However, I think that I may be using > >> the wrong package/function to get the releveant set of genes. > >> Currently, what I'm doing is finding the significant GO terms by > >> using the following code: > >> ----------------------- ### 'genes1' are the Entrez IDs of my genes > >> of interest, and 'allGenes' is the universe of Entrez IDs paramsGO > >> <- new("GOHyperGParams", geneIds = genes1, universeGeneIds > >> = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", > >> pvalueCutoff = 1, conditional = FALSE, testDirection = "over") > >> GO <- hyperGTest(paramsGO) -------------------------- This gives me a > >> set of significant GO terms. Now, I would like to find which subset > >> of genes in 'genes1' is associated with each of the significant GO > >> term. To do this I map all GO terms to their Entrez IDs using the > >> 'org.Hs.eg.db' package using the following: > >> xx <- as.list(org.Hs.egGO2EG) > >> to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms > >> (isn't this number small?) that map to at least one Entrez ID. So, > >> from here I look up which Entrez IDs are associated with my GO term > >> of interest. > >> My problem is that often, the GO term from hyperGTest is not > >> associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) > >> described above ), i.e. the GO term/ID is not in the list obtained > >> from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown > >> up by hyperGTest, but does not appear to be associated with any > >> Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? > >> [[elided Yahoo spam]] > >> Thanks for any comments/suggestions. I realize that I'm probably > >> doing something really stupid here.... > >> My sessionInfo() is: -------------------------------- R version 2.7.2 > >> (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United > >> States.1252;LC_CTYPE=English_United > >> States.1252;LC_MONETARY=English_United > >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > >> attached base packages: [1] grid splines tools stats > >> graphics grDevices utils datasets methods base other > >> attached packages: [1] gplots_2.6.0 gmodels_2.14.1 > >> gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 > >> GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 > >> annotate_1.18.0 xtable_1.5-2 graph_1.18.0 > >> PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] > >> org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 > >> DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 > >> affy_1.18.0 [22] > >> preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded > >> via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 > >> --------------------------------- > >> > >> [[alternative HTML version deleted]] > >> _______________________________________________ Bioconductor mailing > >> list Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > >> archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > >

ADD REPLY • link 16.1 years ago Marc Carlson ★ 7.2k

Login before adding your answer.