GOstats::geneIdUniverse() and its relation to organism annotation db's
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi, I've been trying to do some non-run-of-the-mill gene ontology analysis and have been messing around with the innards of the results we get from GOstats::hyperGTest. As a result, I've found some peculiarities in the GO annotations I'm retrieving from the hyperGTest and how they relate to the GO annotations in my organism's (saccharomyces cerevisiae) annotation database. In particular: for some enriched GO term X, I'm finding genes listed in the geneIdUniverse of X that do not have X as a member of their organism.db.GO map, and I'm not sure how that could be. I'm even setting condition=FALSE to my hyperGTest to make it as "vanilla" as possible (but as far as I understand the conditional test, this wouldn't effect what I'm seeing anyway since genes are removed from GO terms, and not added). I have a specific example below. You'll find that for the first significant GOID found (GO:0007059), gene "YKL049C" is listed as a member of this GOID's geneIdUniverse as it is returned from the hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping (org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF map. === Example === library(GOstats) library(org.Sc.sgd.db) library(annotate) orfs <- c( "YGR084C", "YDR409W", "YFR017C", "YDR342C", "YBR152W", "YIL014W", "YGR134W", "YDL109C", "YDL234C", "YDR524C", "YBR002C", "YGR063C", "YDR034C-A", "YIR011C", "YHR175W", "YHR109W", "YGR207C", "YCR023C", "YER039C-A", "YGL064C", "YGR249W", "YHR172W", "YGL226W", "YAL064W", "YDR309C", "YDR473C", "YAL058W", "YHL030W", "YKL101W", "YJL122W", "YBR121C", "YIL018W", "YDR443C", "YBR069C", "YBR011C", "YHR009C", "YGL166W", "YIL103W", "YDR206W", "YBL031W", "YBR162W-A", "YFL018C", "YBR214W", "YIL076W", "YBR291C", "YKL049C", "YIL149C", "YDR034W-B", "YDR395W") orf2go <- getAnnMap('GO', 'org.Sc.sgd') go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd') all.orfs <- keys(orf2go) ## Do GOstats test to get significant GO ontologies params <- new("GOHyperGParams", geneIds=orfs, universeGeneIds=all.orfs, annotation='org.Sc.sgd', ontology='BP', pvalueCutoff=0.05, conditional=FALSE, testDirection='over') GO <- hyperGTest(params) ## Look at the genes annotated in the universe for each GOID and compare ## with annotations for each gene in org.Sc.sgdGO universes <- geneIdUniverse(GO) uid <- names(universes)[1] ## GO:0007059 uid.members <- universes[[uid]] "YKL049C" %in% uid.members ## TRUE ## Is the gene listed in GOID's (uid) universe annotated with that GOID? ## Given previous result, this *should* be TRUE go.ids <- orf2go[["YKL049C"]] uid %in% names(go.ids) ## FALSE ## Does that GOID have this gene as its member? ## This also should be TRUE check.orfs <- go2orf[[uid]] "YKL049C" %in% check.orfs ## FALSE ====================================== Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term `uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but it's not. Can someone shed some light on this for me? sessionInfo() R version 2.11.1 (2010-05-31) i386-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GO.db_2.4.1 annotate_1.26.1 org.Sc.sgd.db_2.4.1 GOstats_2.14.0 [5] RSQLite_0.9-2 DBI_0.2-5 graph_1.26.0 Category_2.14.0 [9] AnnotationDbi_1.10.2 Biobase_2.8.0 devtools_0.1 stringr_0.4 [13] roxygen_0.1-2 profr_0.1.1 digest_0.4.2 testthat_0.3 loaded via a namespace (and not attached): [1] GSEABase_1.10.0 RBGL_1.24.0 XML_3.1-1 evaluate_0.3 genefilter_1.30.0 [6] plyr_1.1 splines_2.11.1 survival_2.35-8 tools_2.11.1 xtable_1.5-6 Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Annotation GO Cancer GOstats Annotation GO Cancer GOstats • 1.2k views
ADD COMMENT
0
Entering edit mode
@robert-m-flight-4158
Last seen 3 months ago
United States
Hi Steve, I wonder if you are running into the problem that generally when you query for the GO terms you get only those GO terms that the ORF is directly annotated with (this would correspond to the actual annotations on the GO website, for instance), however when calculating enrichment, not only the directly annotated terms but also the parent terms (less specific) are included. I'm thinking of the difference between "org.Sc.sgdGO", which returns "org.Sc.sgdGO is an R object that provides mappings between ORF identifiers and the GO identifiers that they are directly associated with. This mapping and its reverse mapping do NOT associate the child terms from the GO ontology with the gene. Only the directly evidenced terms are represented here.", and a mapping that returns also the other GO terms in the directed acyclic graph (up the tree essentially). This would be the reverse of "org.Sc.sgdGO2ALLORFS". For example, any gene directly annotated to "GO:0007155 : cell adhesion", will also by virtue of the DAG be indirectly annotated with "GO:0022610 : biological adhesion", whether or not it is directly annotated with that or not. When GO enrichment using GOHyperGTest is calculated, all of the indirectly annotated terms are considered as well. (see this link: http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi?action=plus_nod e&target=GO:0022610&open_1=GO:0008150,all,amigo1283864733) This seems to be the cause of your disparity below. You might want to read a bit more about the GO and how enrichment calculations are done. A good paper describing the nuts and bolts of most packages is Gavin Sherlocks paper on GO::TermFinder (http://www.ncbi.nlm.nih.gov/pubmed/15297299), which is the basis for most other enrichment tools, including GOStats. If I do: crud <- mget("GO:0007059", envir=org.Sc.sgdGO2ALLORFS) crud2 <- unlist(crud) grep("YKL049C", crud2) I get 135 as the result. Cheers, -Robert Robert M. Flight, Ph.D. Bioinformatics and Biomedical Computing Laboratory University of Louisville Louisville, KY PH 502-852-0467 EM robert.flight at louisville.edu EM rflight79 at gmail.com Williams and Holland's Law: ? ? ?? If enough data is collected, anything may be proven by statistical methods. On Mon, Sep 6, 2010 at 05:21, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > I've been trying to do some non-run-of-the-mill gene ontology analysis > and have been messing around with the innards of the results we get > from GOstats::hyperGTest. > > As a result, I've found some peculiarities in the GO annotations I'm > retrieving from the hyperGTest and how they relate to the GO > annotations in my organism's (saccharomyces cerevisiae) annotation > database. > > In particular: for some enriched GO term X, I'm finding genes listed > in the geneIdUniverse of X that do not have X as a member of their > organism.db.GO map, and I'm not sure how that could be. I'm even > setting condition=FALSE to my hyperGTest to make it as "vanilla" as > possible (but as far as I understand the conditional test, this > wouldn't effect what I'm seeing anyway since genes are removed from GO > terms, and not added). > > I have a specific example below. You'll find that for the first > significant GOID found (GO:0007059), gene "YKL049C" is listed as a > member of this GOID's geneIdUniverse as it is returned from the > hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping > (org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF > map. > > === Example === > > library(GOstats) > library(org.Sc.sgd.db) > library(annotate) > > orfs <- c( > ?"YGR084C", "YDR409W", ? "YFR017C", ? "YDR342C", ? "YBR152W", > ?"YIL014W", "YGR134W", ? "YDL109C", ? "YDL234C", ? "YDR524C", > ?"YBR002C", "YGR063C", ? "YDR034C-A", "YIR011C", > ?"YHR175W", "YHR109W", ? "YGR207C", ? "YCR023C", ? "YER039C-A", > ?"YGL064C", "YGR249W", ? "YHR172W", ? "YGL226W", ? "YAL064W", > ?"YDR309C", "YDR473C", ? "YAL058W", ? "YHL030W", ? "YKL101W", > ?"YJL122W", "YBR121C", ? "YIL018W", ? "YDR443C", ? "YBR069C", > ?"YBR011C", "YHR009C", ? "YGL166W", ? "YIL103W", ? "YDR206W", > ?"YBL031W", "YBR162W-A", "YFL018C", ? "YBR214W", ? "YIL076W", > ?"YBR291C", "YKL049C", ? "YIL149C", ? "YDR034W-B", "YDR395W") > > orf2go <- getAnnMap('GO', 'org.Sc.sgd') > go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd') > all.orfs <- keys(orf2go) > > ## Do GOstats test to get significant GO ontologies > params <- new("GOHyperGParams", geneIds=orfs, > ? ? ? ? ? ? ?universeGeneIds=all.orfs, annotation='org.Sc.sgd', > ? ? ? ? ? ? ?ontology='BP', pvalueCutoff=0.05, > ? ? ? ? ? ? ?conditional=FALSE, testDirection='over') > GO <- hyperGTest(params) > > ## Look at the genes annotated in the universe for each GOID and compare > ## with annotations for each gene in org.Sc.sgdGO > universes <- geneIdUniverse(GO) > uid <- names(universes)[1] ## GO:0007059 > uid.members <- universes[[uid]] > "YKL049C" %in% uid.members ## TRUE > > ## Is the gene listed in GOID's (uid) universe annotated with that GOID? > ## Given previous result, this *should* be TRUE > go.ids <- orf2go[["YKL049C"]] > uid %in% names(go.ids) ## FALSE > > > ## Does that GOID have this gene as its member? > ## This also should be TRUE > check.orfs <- go2orf[[uid]] > "YKL049C" %in% check.orfs ## FALSE > > ====================================== > > Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term > `uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but > it's not. > > Can someone shed some light on this for me? > > sessionInfo() > R version 2.11.1 (2010-05-31) > i386-apple-darwin9.8.0 > > locale: > [1] C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > ?[1] GO.db_2.4.1 ? ? ? ? ?annotate_1.26.1 ? ? ?org.Sc.sgd.db_2.4.1 > GOstats_2.14.0 > ?[5] RSQLite_0.9-2 ? ? ? ?DBI_0.2-5 ? ? ? ? ? ?graph_1.26.0 > Category_2.14.0 > ?[9] AnnotationDbi_1.10.2 Biobase_2.8.0 ? ? ? ?devtools_0.1 > stringr_0.4 > [13] roxygen_0.1-2 ? ? ? ?profr_0.1.1 ? ? ? ? ?digest_0.4.2 > testthat_0.3 > > loaded via a namespace (and not attached): > ?[1] GSEABase_1.10.0 ? RBGL_1.24.0 ? ? ? XML_3.1-1 > evaluate_0.3 ? ? ?genefilter_1.30.0 > ?[6] plyr_1.1 ? ? ? ? ?splines_2.11.1 ? ?survival_2.35-8 > tools_2.11.1 ? ? ?xtable_1.5-6 > > > Thanks, > -steve > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Hi Robert, In my hunt to find "the bug" I didn't stumble upon the difference between org.Sc.sgdGO and org.Sc.sgdGO2ALLORFS (I don't think I realized the latter one even existed). I'm guessing the problem I thought I saw was due to my oversight and is exactly as you say. Thanks for the thorough explanation and references. Cheers, -steve On Tue, Sep 7, 2010 at 9:16 AM, Robert M. Flight <rflight79 at="" gmail.com=""> wrote: > Hi Steve, > > I wonder if you are running into the problem that generally when you > query for the GO terms you get only those GO terms that the ORF is > directly annotated with (this would correspond to the actual > annotations on the GO website, for instance), however when calculating > enrichment, not only the directly annotated terms but also the parent > terms (less specific) are included. > > I'm thinking of the difference between "org.Sc.sgdGO", which returns > "org.Sc.sgdGO is an R object that provides mappings between ORF > identifiers and the GO identifiers that they are directly associated > with. This mapping and its reverse mapping do NOT associate the child > terms from the GO ontology with the gene. Only the directly evidenced > terms are represented here.", and a mapping that returns also the > other GO terms in the directed acyclic graph (up the tree > essentially). This would be the reverse of "org.Sc.sgdGO2ALLORFS". > > For example, any gene directly annotated to "GO:0007155 : cell > adhesion", will also by virtue of the DAG be indirectly annotated with > "GO:0022610 : biological adhesion", whether or not it is directly > annotated with that or not. When GO enrichment using GOHyperGTest is > calculated, all of the indirectly annotated terms are considered as > well. (see this link: > http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi?action=plus_n ode&target=GO:0022610&open_1=GO:0008150,all,amigo1283864733) > > This seems to be the cause of your disparity below. You might want to > read a bit more about the GO and how enrichment calculations are done. > A good paper describing the nuts and bolts of most packages is Gavin > Sherlocks paper on GO::TermFinder > (http://www.ncbi.nlm.nih.gov/pubmed/15297299), which is the basis for > most other enrichment tools, including GOStats. > > If I do: > > crud <- mget("GO:0007059", envir=org.Sc.sgdGO2ALLORFS) > crud2 <- unlist(crud) > grep("YKL049C", crud2) > > I get 135 as the result. > > Cheers, > > -Robert > > Robert M. Flight, Ph.D. > Bioinformatics and Biomedical Computing Laboratory > University of Louisville > Louisville, KY > > PH 502-852-0467 > EM robert.flight at louisville.edu > EM rflight79 at gmail.com > > Williams and Holland's Law: > ? ? ?? If enough data is collected, anything may be proven by > statistical methods. > > > > On Mon, Sep 6, 2010 at 05:21, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi, >> >> I've been trying to do some non-run-of-the-mill gene ontology analysis >> and have been messing around with the innards of the results we get >> from GOstats::hyperGTest. >> >> As a result, I've found some peculiarities in the GO annotations I'm >> retrieving from the hyperGTest and how they relate to the GO >> annotations in my organism's (saccharomyces cerevisiae) annotation >> database. >> >> In particular: for some enriched GO term X, I'm finding genes listed >> in the geneIdUniverse of X that do not have X as a member of their >> organism.db.GO map, and I'm not sure how that could be. I'm even >> setting condition=FALSE to my hyperGTest to make it as "vanilla" as >> possible (but as far as I understand the conditional test, this >> wouldn't effect what I'm seeing anyway since genes are removed from GO >> terms, and not added). >> >> I have a specific example below. You'll find that for the first >> significant GOID found (GO:0007059), gene "YKL049C" is listed as a >> member of this GOID's geneIdUniverse as it is returned from the >> hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping >> (org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF >> map. >> >> === Example === >> >> library(GOstats) >> library(org.Sc.sgd.db) >> library(annotate) >> >> orfs <- c( >> ?"YGR084C", "YDR409W", ? "YFR017C", ? "YDR342C", ? "YBR152W", >> ?"YIL014W", "YGR134W", ? "YDL109C", ? "YDL234C", ? "YDR524C", >> ?"YBR002C", "YGR063C", ? "YDR034C-A", "YIR011C", >> ?"YHR175W", "YHR109W", ? "YGR207C", ? "YCR023C", ? "YER039C-A", >> ?"YGL064C", "YGR249W", ? "YHR172W", ? "YGL226W", ? "YAL064W", >> ?"YDR309C", "YDR473C", ? "YAL058W", ? "YHL030W", ? "YKL101W", >> ?"YJL122W", "YBR121C", ? "YIL018W", ? "YDR443C", ? "YBR069C", >> ?"YBR011C", "YHR009C", ? "YGL166W", ? "YIL103W", ? "YDR206W", >> ?"YBL031W", "YBR162W-A", "YFL018C", ? "YBR214W", ? "YIL076W", >> ?"YBR291C", "YKL049C", ? "YIL149C", ? "YDR034W-B", "YDR395W") >> >> orf2go <- getAnnMap('GO', 'org.Sc.sgd') >> go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd') >> all.orfs <- keys(orf2go) >> >> ## Do GOstats test to get significant GO ontologies >> params <- new("GOHyperGParams", geneIds=orfs, >> ? ? ? ? ? ? ?universeGeneIds=all.orfs, annotation='org.Sc.sgd', >> ? ? ? ? ? ? ?ontology='BP', pvalueCutoff=0.05, >> ? ? ? ? ? ? ?conditional=FALSE, testDirection='over') >> GO <- hyperGTest(params) >> >> ## Look at the genes annotated in the universe for each GOID and compare >> ## with annotations for each gene in org.Sc.sgdGO >> universes <- geneIdUniverse(GO) >> uid <- names(universes)[1] ## GO:0007059 >> uid.members <- universes[[uid]] >> "YKL049C" %in% uid.members ## TRUE >> >> ## Is the gene listed in GOID's (uid) universe annotated with that GOID? >> ## Given previous result, this *should* be TRUE >> go.ids <- orf2go[["YKL049C"]] >> uid %in% names(go.ids) ## FALSE >> >> >> ## Does that GOID have this gene as its member? >> ## This also should be TRUE >> check.orfs <- go2orf[[uid]] >> "YKL049C" %in% check.orfs ## FALSE >> >> ====================================== >> >> Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term >> `uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but >> it's not. >> >> Can someone shed some light on this for me? >> >> sessionInfo() >> R version 2.11.1 (2010-05-31) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> ?[1] GO.db_2.4.1 ? ? ? ? ?annotate_1.26.1 ? ? ?org.Sc.sgd.db_2.4.1 >> GOstats_2.14.0 >> ?[5] RSQLite_0.9-2 ? ? ? ?DBI_0.2-5 ? ? ? ? ? ?graph_1.26.0 >> Category_2.14.0 >> ?[9] AnnotationDbi_1.10.2 Biobase_2.8.0 ? ? ? ?devtools_0.1 >> stringr_0.4 >> [13] roxygen_0.1-2 ? ? ? ?profr_0.1.1 ? ? ? ? ?digest_0.4.2 >> testthat_0.3 >> >> loaded via a namespace (and not attached): >> ?[1] GSEABase_1.10.0 ? RBGL_1.24.0 ? ? ? XML_3.1-1 >> evaluate_0.3 ? ? ?genefilter_1.30.0 >> ?[6] plyr_1.1 ? ? ? ? ?splines_2.11.1 ? ?survival_2.35-8 >> tools_2.11.1 ? ? ?xtable_1.5-6 >> >> >> Thanks, >> -steve >> >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| Memorial Sloan-Kettering Cancer Center >> ?| Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
Hi Steve, I can verify that GOstats uses "GO2ALL" mappings to do its calculations. Speeding this sort of comparison up is actually a lot of why these mappings are actually around. Marc On 09/07/2010 11:15 AM, Steve Lianoglou wrote: > Hi Robert, > > In my hunt to find "the bug" I didn't stumble upon the difference > between org.Sc.sgdGO and org.Sc.sgdGO2ALLORFS (I don't think I > realized the latter one even existed). > > I'm guessing the problem I thought I saw was due to my oversight and > is exactly as you say. Thanks for the thorough explanation and > references. > > Cheers, > -steve > > On Tue, Sep 7, 2010 at 9:16 AM, Robert M. Flight <rflight79 at="" gmail.com=""> wrote: > >> Hi Steve, >> >> I wonder if you are running into the problem that generally when you >> query for the GO terms you get only those GO terms that the ORF is >> directly annotated with (this would correspond to the actual >> annotations on the GO website, for instance), however when calculating >> enrichment, not only the directly annotated terms but also the parent >> terms (less specific) are included. >> >> I'm thinking of the difference between "org.Sc.sgdGO", which returns >> "org.Sc.sgdGO is an R object that provides mappings between ORF >> identifiers and the GO identifiers that they are directly associated >> with. This mapping and its reverse mapping do NOT associate the child >> terms from the GO ontology with the gene. Only the directly evidenced >> terms are represented here.", and a mapping that returns also the >> other GO terms in the directed acyclic graph (up the tree >> essentially). This would be the reverse of "org.Sc.sgdGO2ALLORFS". >> >> For example, any gene directly annotated to "GO:0007155 : cell >> adhesion", will also by virtue of the DAG be indirectly annotated with >> "GO:0022610 : biological adhesion", whether or not it is directly >> annotated with that or not. When GO enrichment using GOHyperGTest is >> calculated, all of the indirectly annotated terms are considered as >> well. (see this link: >> http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi?action=plus_ node&target=GO:0022610&open_1=GO:0008150,all,amigo1283864733) >> >> This seems to be the cause of your disparity below. You might want to >> read a bit more about the GO and how enrichment calculations are done. >> A good paper describing the nuts and bolts of most packages is Gavin >> Sherlocks paper on GO::TermFinder >> (http://www.ncbi.nlm.nih.gov/pubmed/15297299), which is the basis for >> most other enrichment tools, including GOStats. >> >> If I do: >> >> crud <- mget("GO:0007059", envir=org.Sc.sgdGO2ALLORFS) >> crud2 <- unlist(crud) >> grep("YKL049C", crud2) >> >> I get 135 as the result. >> >> Cheers, >> >> -Robert >> >> Robert M. Flight, Ph.D. >> Bioinformatics and Biomedical Computing Laboratory >> University of Louisville >> Louisville, KY >> >> PH 502-852-0467 >> EM robert.flight at louisville.edu >> EM rflight79 at gmail.com >> >> Williams and Holland's Law: >> If enough data is collected, anything may be proven by >> statistical methods. >> >> >> >> On Mon, Sep 6, 2010 at 05:21, Steve Lianoglou >> <mailinglist.honeypot at="" gmail.com=""> wrote: >> >>> Hi, >>> >>> I've been trying to do some non-run-of-the-mill gene ontology analysis >>> and have been messing around with the innards of the results we get >>> from GOstats::hyperGTest. >>> >>> As a result, I've found some peculiarities in the GO annotations I'm >>> retrieving from the hyperGTest and how they relate to the GO >>> annotations in my organism's (saccharomyces cerevisiae) annotation >>> database. >>> >>> In particular: for some enriched GO term X, I'm finding genes listed >>> in the geneIdUniverse of X that do not have X as a member of their >>> organism.db.GO map, and I'm not sure how that could be. I'm even >>> setting condition=FALSE to my hyperGTest to make it as "vanilla" as >>> possible (but as far as I understand the conditional test, this >>> wouldn't effect what I'm seeing anyway since genes are removed from GO >>> terms, and not added). >>> >>> I have a specific example below. You'll find that for the first >>> significant GOID found (GO:0007059), gene "YKL049C" is listed as a >>> member of this GOID's geneIdUniverse as it is returned from the >>> hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping >>> (org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF >>> map. >>> >>> === Example === >>> >>> library(GOstats) >>> library(org.Sc.sgd.db) >>> library(annotate) >>> >>> orfs <- c( >>> "YGR084C", "YDR409W", "YFR017C", "YDR342C", "YBR152W", >>> "YIL014W", "YGR134W", "YDL109C", "YDL234C", "YDR524C", >>> "YBR002C", "YGR063C", "YDR034C-A", "YIR011C", >>> "YHR175W", "YHR109W", "YGR207C", "YCR023C", "YER039C-A", >>> "YGL064C", "YGR249W", "YHR172W", "YGL226W", "YAL064W", >>> "YDR309C", "YDR473C", "YAL058W", "YHL030W", "YKL101W", >>> "YJL122W", "YBR121C", "YIL018W", "YDR443C", "YBR069C", >>> "YBR011C", "YHR009C", "YGL166W", "YIL103W", "YDR206W", >>> "YBL031W", "YBR162W-A", "YFL018C", "YBR214W", "YIL076W", >>> "YBR291C", "YKL049C", "YIL149C", "YDR034W-B", "YDR395W") >>> >>> orf2go <- getAnnMap('GO', 'org.Sc.sgd') >>> go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd') >>> all.orfs <- keys(orf2go) >>> >>> ## Do GOstats test to get significant GO ontologies >>> params <- new("GOHyperGParams", geneIds=orfs, >>> universeGeneIds=all.orfs, annotation='org.Sc.sgd', >>> ontology='BP', pvalueCutoff=0.05, >>> conditional=FALSE, testDirection='over') >>> GO <- hyperGTest(params) >>> >>> ## Look at the genes annotated in the universe for each GOID and compare >>> ## with annotations for each gene in org.Sc.sgdGO >>> universes <- geneIdUniverse(GO) >>> uid <- names(universes)[1] ## GO:0007059 >>> uid.members <- universes[[uid]] >>> "YKL049C" %in% uid.members ## TRUE >>> >>> ## Is the gene listed in GOID's (uid) universe annotated with that GOID? >>> ## Given previous result, this *should* be TRUE >>> go.ids <- orf2go[["YKL049C"]] >>> uid %in% names(go.ids) ## FALSE >>> >>> >>> ## Does that GOID have this gene as its member? >>> ## This also should be TRUE >>> check.orfs <- go2orf[[uid]] >>> "YKL049C" %in% check.orfs ## FALSE >>> >>> ====================================== >>> >>> Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term >>> `uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but >>> it's not. >>> >>> Can someone shed some light on this for me? >>> >>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GO.db_2.4.1 annotate_1.26.1 org.Sc.sgd.db_2.4.1 >>> GOstats_2.14.0 >>> [5] RSQLite_0.9-2 DBI_0.2-5 graph_1.26.0 >>> Category_2.14.0 >>> [9] AnnotationDbi_1.10.2 Biobase_2.8.0 devtools_0.1 >>> stringr_0.4 >>> [13] roxygen_0.1-2 profr_0.1.1 digest_0.4.2 >>> testthat_0.3 >>> >>> loaded via a namespace (and not attached): >>> [1] GSEABase_1.10.0 RBGL_1.24.0 XML_3.1-1 >>> evaluate_0.3 genefilter_1.30.0 >>> [6] plyr_1.1 splines_2.11.1 survival_2.35-8 >>> tools_2.11.1 xtable_1.5-6 >>> >>> >>> Thanks, >>> -steve >>> >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> > > >
ADD REPLY
0
Entering edit mode
Hi Marc, On Wed, Sep 8, 2010 at 1:36 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Steve, > > I can verify that GOstats uses "GO2ALL" mappings to do its > calculations. ?Speeding this sort of comparison up is actually a lot of > why these mappings are actually around. Thanks for the clarification and reassurance, it's much appreciated. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6