topGO analysis returns different gene counts for category than given for non-model organism
1
0
Entering edit mode
@diana-toups-dugas-6125
Last seen 10.3 years ago
Hi- I'm trying to analyze some Sorghum NGS transcriptome data using topGO. The analysis runs using my custom geneID2GO object and the annFUN.gene2GO function. When I run GenTable, however, the results are strange. For example, for the GO:0006970 category, the geneID2GO mappings I used only lists 48 genes annotated to this category, but the GenTable function claims that 335 genes are annotated and 270 of them are significant. If I dig deeper and see what those 335 genes are they do indeed appear to be sorghum gene IDs; they are all unique and contain my original 48. I assume that the topGO package is pulling GO-to-gene mappings from somewhere to find these additional genes, but I cannot find documentation for it anywhere. Does anyone know why/how topGO is locating more genes than I give per GO category and from where they are getting pulled from? If my GO annotations are not complete I would like to know, as I thought they were. Thanks for the help! Sincerely, -Diana Former USDA researcher Current USDA consultant [[alternative HTML version deleted]]
GO Category topGO GO Category topGO • 1.6k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Diana, I'm not an expert at topGO but maybe I can help track this down. It sounds like for the same input, GenTable is returning many more genes than annFUN.gene2GO. Looking at the annFUN.gene2GO function it looks like several 'filters' are applied. Genes with no annotation are dropped and only one-to-one mappings are kept (subsetting on 'goodGO' below). I've taken the annFUN.gene2GO code from the package so we can see the comments. annFUN.gene2GO <- function(whichOnto, feasibleGenes = NULL, gene2GO) { ## GO terms annotated to the specified ontology ontoGO <- get(paste("GO", whichOnto, "Term", sep = "")) ## Restrict the mappings to the feasibleGenes set if(!is.null(feasibleGenes)) gene2GO <- gene2GO[intersect(names(gene2GO), feasibleGenes)] ## Throw-up the genes which have no annotation if(anyis.na(gene2GO))) gene2GO <- gene2GO[!is.na(gene2GO)] gene2GO <- gene2GO[sapply(gene2GO, length) > 0] ## Get all the GO and keep a one-to-one mapping with the genes allGO <- unlist(gene2GO, use.names = FALSE) geneID <- rep(names(gene2GO), sapply(gene2GO, length)) goodGO <- allGO %in% ls(ontoGO) return(split(geneID[goodGO], allGO[goodGO])) } The GeneTable code has no such filtering. The function is long so I won't post the full code here. You can see it with, > selectMethod("GenTable", "topGOdata") Method Definition: function (object, ...) { .local <- function (object, ..., orderBy = 1, ranksOf = 2, topNodes = 10, numChar = 40, format.FUN = format.pval, decreasing = FALSE, useLevels = FALSE) { ... ... } If this is not helpful, please provide a reproducible example and I will look into it further. Valerie On 08/29/2013 12:27 PM, Diana Toups Dugas wrote: > Hi- > I'm trying to analyze some Sorghum NGS transcriptome data using topGO. The > analysis runs using my custom geneID2GO object and the annFUN.gene2GO > function. When I run GenTable, however, the results are strange. For > example, for the GO:0006970 category, the geneID2GO mappings I used only > lists 48 genes annotated to this category, but the GenTable function claims > that 335 genes are annotated and 270 of them are significant. If I dig > deeper and see what those 335 genes are they do indeed appear to be sorghum > gene IDs; they are all unique and contain my original 48. > > I assume that the topGO package is pulling GO-to-gene mappings from > somewhere to find these additional genes, but I cannot find documentation > for it anywhere. Does anyone know why/how topGO is locating more genes > than I give per GO category and from where they are getting pulled from? > If my GO annotations are not complete I would like to know, as I thought > they were. > > Thanks for the help! > Sincerely, > -Diana > Former USDA researcher > Current USDA consultant > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6