Find enriched GO terms, given a list of GO terms of interest and background GO terms

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

I have annotated some genomes with my own protein function annotation pipeline. Each genome will have a list of genes and their corresponding GO terms. In my annotation, the GO terms are on all kinds of different levels because of the hierarchical structure of GO itself. So my input will be all the GO terms for one (or a few) genome of interest, and a bigger set of background GO terms. I would like to figure out which GO terms are enriched in the interesting genomes and what they do. It'll be great if I could also specify on what level the GO terms are summarized. Can topGO handle this kind of problem? If not, any suggestions? Thanks a lot! -JJ -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] AnnotationForge_1.4.4 org.Hs.eg.db_2.10.1 GOFunction_1.10.0 Rgraphviz_2.6.0 [5] ALL_1.4.16 topGO_2.14.0 SparseM_1.03 GO.db_2.10.1 [9] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 [13] BiocGenerics_0.8.0 graph_1.40.1 BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] IRanges_1.20.6 lattice_0.20-24 stats4_3.0.2 tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.

Annotation GO topGO genomes Annotation GO topGO genomes • 2.5k views

ADD COMMENT • link 11.3 years ago Guest User ★ 13k

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

I have annotated some genomes with my own protein function annotation pipeline. Each genome will have a list of genes and their corresponding GO terms. In my annotation, the GO terms are on all kinds of different levels because of the hierarchical structure of GO itself. So my input will be all the GO terms for one (or a few) genome of interest, and a bigger set of background GO terms. I would like to figure out which GO terms are enriched in the interesting genomes and what they do. It'll be great if I could also specify on what level the GO terms are summarized. Can topGO handle this kind of problem? If not, any suggestions? Thanks a lot! -JJ -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] AnnotationForge_1.4.4 org.Hs.eg.db_2.10.1 GOFunction_1.10.0 Rgraphviz_2.6.0 [5] ALL_1.4.16 topGO_2.14.0 SparseM_1.03 GO.db_2.10.1 [9] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 [13] BiocGenerics_0.8.0 graph_1.40.1 BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] IRanges_1.20.6 lattice_0.20-24 stats4_3.0.2 tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 11.3 years ago Guest User ★ 13k

0

Entering edit mode

Hi JJ, yes, you should be able to do that with topGO. Please read through Section 4.3 in the package vignette on how to use custom annotations: http://www.bioconductor.org/packages/release/bioc/vignettes/topGO/inst /doc/topGO.pdf You can't specify a level and chose to test only the GO terms on that level. That won't be much useful and it won't really take advantage of the GO hierarchy. Hope this helps. Regards, Adrian Alexa On Wed, Jan 15, 2014 at 10:40 PM, JJ [guest] <guest@bioconductor.org> wrote: > > I have annotated some genomes with my own protein function annotation > pipeline. Each genome will have a list of genes and their corresponding GO > terms. In my annotation, the GO terms are on all kinds of different levels > because of the hierarchical structure of GO itself. > > So my input will be all the GO terms for one (or a few) genome of > interest, and a bigger set of background GO terms. I would like to figure > out which GO terms are enriched in the interesting genomes and what they do. > > It'll be great if I could also specify on what level the GO terms are > summarized. > > Can topGO handle this kind of problem? If not, any suggestions? > > Thanks a lot! > > -JJ > > > > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] AnnotationForge_1.4.4 org.Hs.eg.db_2.10.1 GOFunction_1.10.0 > Rgraphviz_2.6.0 > [5] ALL_1.4.16 topGO_2.14.0 SparseM_1.03 > GO.db_2.10.1 > [9] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 > Biobase_2.22.0 > [13] BiocGenerics_0.8.0 graph_1.40.1 BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.20.6 lattice_0.20-24 stats4_3.0.2 tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.3 years ago Adrian Alexa ▴ 400

0

Entering edit mode

Thank you Adrian, following the instructions, I got it to work! Another question, can you provide a rough calculation of the memory usage of a topGOdata object, given the number of GO terms and genes? Thanks, -JJ -- JJ Chai National Center for Computational Sciences Computer Science and Mathematics Div Oak Ridge National Laboratory On Thu, Jan 16, 2014 at 11:41 AM, Adrian Alexa <adrian.alexa@gmail.com>wrote: > Hi JJ, > > yes, you should be able to do that with topGO. Please read through Section > 4.3 in the package vignette on how to use custom annotations: > > http://www.bioconductor.org/packages/release/bioc/vignettes/topGO/in st/doc/topGO.pdf > > You can't specify a level and chose to test only the GO terms on that > level. That won't be much useful and it won't really take advantage of the > GO hierarchy. > > Hope this helps. > > > Regards, > Adrian Alexa > > > > > > On Wed, Jan 15, 2014 at 10:40 PM, JJ [guest] <guest@bioconductor.org>wrote: > >> >> I have annotated some genomes with my own protein function annotation >> pipeline. Each genome will have a list of genes and their corresponding GO >> terms. In my annotation, the GO terms are on all kinds of different levels >> because of the hierarchical structure of GO itself. >> >> So my input will be all the GO terms for one (or a few) genome of >> interest, and a bigger set of background GO terms. I would like to figure >> out which GO terms are enriched in the interesting genomes and what they do. >> >> It'll be great if I could also specify on what level the GO terms are >> summarized. >> >> Can topGO handle this kind of problem? If not, any suggestions? >> >> Thanks a lot! >> >> -JJ >> >> >> >> >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] grid parallel stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] AnnotationForge_1.4.4 org.Hs.eg.db_2.10.1 GOFunction_1.10.0 >> Rgraphviz_2.6.0 >> [5] ALL_1.4.16 topGO_2.14.0 SparseM_1.03 >> GO.db_2.10.1 >> [9] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 >> Biobase_2.22.0 >> [13] BiocGenerics_0.8.0 graph_1.40.1 BiocInstaller_1.12.0 >> >> loaded via a namespace (and not attached): >> [1] IRanges_1.20.6 lattice_0.20-24 stats4_3.0.2 tools_3.0.2 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 11.3 years ago JJ Chai ▴ 10

Login before adding your answer.