GOstats - internal filtering?

0

Entering edit mode

Andrew Jaffe ▴ 120

@andrew-jaffe-4820

Last seen 10.6 years ago

Hopefully I can get a quick answer to this question about GOstats. I'm trying to calculate enrichment for every GO category using the GOstats package. I would assume that setting the p-value cutoff = 1 with conditional=FALSE would give me an enrichment odds ratio/p-value for every GO category in, say, the BP ontology. However, this does not seem to be the case, as the number of categories returned seems to be a function of the geneIds supplied: > params = new("GOHyperGParams", geneIds = y$ENTREZID[y$p < 0.001], + universeGeneIds = y$ENTREZID, + annotation = "hgu133plus2.db", + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, + testDirection="over") > ht=hyperGTest(params) > nrow(summary(ht)) [1] 6080 > params2 = new("GOHyperGParams", geneIds = y$ENTREZID[y$p < 0.01], universeGeneIds = y$ENTREZID, + universeGeneIds = y$ENTREZID, + annotation = "hgu133plus2.db", + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, + testDirection="over") > ht2=hyperGTest(params2) > nrow(summary(ht2)) [1] 7856 Does the HyperGTest function drop GO categories without any genes in them prior to returning the results table? Or is something else going on? Thanks, Andrew > sessionInfo() R version 2.15.0 Patched (2012-04-20 r59123) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] GO.db_2.7.1 sva_3.2.0 mgcv_1.7-13 [4] corpcor_1.6.2 hgu133plus2.db_2.7.1 genefilter_1.38.0 [7] RColorBrewer_1.0-5 GOstats_2.22.0 Category_2.22.0 [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [13] funxBox_0.1 digest_0.5.2 multtest_2.12.0 [16] GSEABase_1.18.0 graph_1.34.0 annotate_1.34.0 [19] AnnotationDbi_1.18.0 limma_3.12.0 Biobase_2.16.0 [22] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] grid_2.15.0 IRanges_1.14.2 lattice_0.20-6 MASS_7.3-17 [5] Matrix_1.0-6 nlme_3.1-103 RBGL_1.32.0 splines_2.15.0 [9] stats4_2.15.0 survival_2.36-12 tools_2.15.0 XML_3.9-4 [13] xtable_1.7-0 [[alternative HTML version deleted]]

Annotation GO hgu133plus2 GOstats Category Annotation GO hgu133plus2 GOstats Category • 1.6k views

ADD COMMENT • link updated 13.0 years ago by James W. MacDonald 68k • written 13.0 years ago by Andrew Jaffe ▴ 120

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Andrew, On 4/20/2012 10:50 PM, Andrew Jaffe wrote: > Hopefully I can get a quick answer to this question about GOstats. > > I'm trying to calculate enrichment for every GO category using the GOstats > package. I would assume that setting the p-value cutoff = 1 with > conditional=FALSE would give me an enrichment odds ratio/p-value for every > GO category in, say, the BP ontology. However, this does not seem to be the > case, as the number of categories returned seems to be a function of the > geneIds supplied: > >> params = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.001], > + universeGeneIds = y$ENTREZID, > + annotation = "hgu133plus2.db", > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > + testDirection="over") >> ht=hyperGTest(params) >> nrow(summary(ht)) > [1] 6080 > >> params2 = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.01], > universeGeneIds = y$ENTREZID, > + universeGeneIds = y$ENTREZID, > + annotation = "hgu133plus2.db", > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > + testDirection="over") >> ht2=hyperGTest(params2) >> nrow(summary(ht2)) > [1] 7856 > > Does the HyperGTest function drop GO categories without any genes in them > prior to returning the results table? Or is something else going on? Technically, yes. The only GO terms that are tested are those that arise from mapping your Entrez Gene IDs to GO terms. Best, Jim > > Thanks, > Andrew > >> sessionInfo() > R version 2.15.0 Patched (2012-04-20 r59123) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GO.db_2.7.1 sva_3.2.0 mgcv_1.7-13 > [4] corpcor_1.6.2 hgu133plus2.db_2.7.1 genefilter_1.38.0 > [7] RColorBrewer_1.0-5 GOstats_2.22.0 Category_2.22.0 > [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 > [13] funxBox_0.1 digest_0.5.2 multtest_2.12.0 > [16] GSEABase_1.18.0 graph_1.34.0 annotate_1.34.0 > [19] AnnotationDbi_1.18.0 limma_3.12.0 Biobase_2.16.0 > [22] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] grid_2.15.0 IRanges_1.14.2 lattice_0.20-6 MASS_7.3-17 > [5] Matrix_1.0-6 nlme_3.1-103 RBGL_1.32.0 splines_2.15.0 > [9] stats4_2.15.0 survival_2.36-12 tools_2.15.0 XML_3.9-4 > [13] xtable_1.7-0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.0 years ago James W. MacDonald 68k

0

Entering edit mode

> The only GO terms that are tested are those that arise from mapping your Entrez Gene IDs to GO terms. Just to be clear, you mean by mapping the *significant*/selected Entrez Gene IDs (say by a differential expression analysis) to GO terms? Because the universe/possible Entrez Gene IDs on the microarray was identical in both examples below. Thanks, Andrew On Mon, Apr 23, 2012 at 10:42 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Andrew, > > On 4/20/2012 10:50 PM, Andrew Jaffe wrote: > > Hopefully I can get a quick answer to this question about GOstats. > > > > I'm trying to calculate enrichment for every GO category using the > GOstats > > package. I would assume that setting the p-value cutoff = 1 with > > conditional=FALSE would give me an enrichment odds ratio/p-value for > every > > GO category in, say, the BP ontology. However, this does not seem to be > the > > case, as the number of categories returned seems to be a function of the > > geneIds supplied: > > > >> params = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.001], > > + universeGeneIds = y$ENTREZID, > > + annotation = "hgu133plus2.db", > > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > > + testDirection="over") > >> ht=hyperGTest(params) > >> nrow(summary(ht)) > > [1] 6080 > > > >> params2 = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.01], > > universeGeneIds = y$ENTREZID, > > + universeGeneIds = y$ENTREZID, > > + annotation = "hgu133plus2.db", > > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > > + testDirection="over") > >> ht2=hyperGTest(params2) > >> nrow(summary(ht2)) > > [1] 7856 > > > > Does the HyperGTest function drop GO categories without any genes in them > > prior to returning the results table? Or is something else going on? > > Technically, yes. The only GO terms that are tested are those that arise > from mapping your Entrez Gene IDs to GO terms. > > Best, > > Jim > > > > > > Thanks, > > Andrew > > > >> sessionInfo() > > R version 2.15.0 Patched (2012-04-20 r59123) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] GO.db_2.7.1 sva_3.2.0 mgcv_1.7-13 > > [4] corpcor_1.6.2 hgu133plus2.db_2.7.1 genefilter_1.38.0 > > [7] RColorBrewer_1.0-5 GOstats_2.22.0 Category_2.22.0 > > [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 > > [13] funxBox_0.1 digest_0.5.2 multtest_2.12.0 > > [16] GSEABase_1.18.0 graph_1.34.0 annotate_1.34.0 > > [19] AnnotationDbi_1.18.0 limma_3.12.0 Biobase_2.16.0 > > [22] BiocGenerics_0.2.0 > > > > loaded via a namespace (and not attached): > > [1] grid_2.15.0 IRanges_1.14.2 lattice_0.20-6 MASS_7.3-17 > > [5] Matrix_1.0-6 nlme_3.1-103 RBGL_1.32.0 splines_2.15.0 > > [9] stats4_2.15.0 survival_2.36-12 tools_2.15.0 XML_3.9-4 > > [13] xtable_1.7-0 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Andrew Jaffe ▴ 120

0

Entering edit mode

Hi Andrew, On 4/23/2012 10:56 AM, Andrew Jaffe wrote: > > The only GO terms that are tested are those that arise from mapping > your Entrez Gene IDs to GO terms. > Just to be clear, you mean by mapping the > *significant*/selected Entrez Gene IDs (say by a differential > expression analysis) to GO terms? Because the universe/possible Entrez > Gene IDs on the microarray was identical in both examples below. Right. When I said *your* Entrez Gene IDs I meant the set of significant Entrez Gene IDs from your experiment. Technically speaking, one could always test GO terms that are not represented by any of the significant genes in an experiment, but a use case for that doesn't readily come to mind. Best, Jim > > Thanks, > Andrew > > On Mon, Apr 23, 2012 at 10:42 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Andrew, > > On 4/20/2012 10:50 PM, Andrew Jaffe wrote: > > Hopefully I can get a quick answer to this question about GOstats. > > > > I'm trying to calculate enrichment for every GO category using > the GOstats > > package. I would assume that setting the p-value cutoff = 1 with > > conditional=FALSE would give me an enrichment odds ratio/p-value > for every > > GO category in, say, the BP ontology. However, this does not > seem to be the > > case, as the number of categories returned seems to be a > function of the > > geneIds supplied: > > > >> params = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.001], > > + universeGeneIds = y$ENTREZID, > > + annotation = "hgu133plus2.db", > > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > > + testDirection="over") > >> ht=hyperGTest(params) > >> nrow(summary(ht)) > > [1] 6080 > > > >> params2 = new("GOHyperGParams", geneIds = y$ENTREZID[y$p< 0.01], > > universeGeneIds = y$ENTREZID, > > + universeGeneIds = y$ENTREZID, > > + annotation = "hgu133plus2.db", > > + ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > > + testDirection="over") > >> ht2=hyperGTest(params2) > >> nrow(summary(ht2)) > > [1] 7856 > > > > Does the HyperGTest function drop GO categories without any > genes in them > > prior to returning the results table? Or is something else going on? > > Technically, yes. The only GO terms that are tested are those that > arise > from mapping your Entrez Gene IDs to GO terms. > > Best, > > Jim > > > > > > Thanks, > > Andrew > > > >> sessionInfo() > > R version 2.15.0 Patched (2012-04-20 r59123) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] GO.db_2.7.1 sva_3.2.0 mgcv_1.7-13 > > [4] corpcor_1.6.2 hgu133plus2.db_2.7.1 genefilter_1.38.0 > > [7] RColorBrewer_1.0-5 GOstats_2.22.0 Category_2.22.0 > > [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 > > [13] funxBox_0.1 digest_0.5.2 multtest_2.12.0 > > [16] GSEABase_1.18.0 graph_1.34.0 annotate_1.34.0 > > [19] AnnotationDbi_1.18.0 limma_3.12.0 Biobase_2.16.0 > > [22] BiocGenerics_0.2.0 > > > > loaded via a namespace (and not attached): > > [1] grid_2.15.0 IRanges_1.14.2 lattice_0.20-6 MASS_7.3-17 > > [5] Matrix_1.0-6 nlme_3.1-103 RBGL_1.32.0 > splines_2.15.0 > > [9] stats4_2.15.0 survival_2.36-12 tools_2.15.0 XML_3.9-4 > > [13] xtable_1.7-0 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 13.0 years ago James W. MacDonald 68k

Login before adding your answer.