topGO analysis returns different gene counts for category than given for non-model organism

0

Entering edit mode

Diana Toups Dugas ▴ 20

@diana-toups-dugas-6125

Last seen 10.5 years ago

Hi- I'm trying to analyze some Sorghum NGS transcriptome data using topGO. The analysis runs using my custom geneID2GO object and the annFUN.gene2GO function. When I run GenTable, however, the results are strange. For example, for the GO:0006970 category, the geneID2GO mappings I used only lists 48 genes annotated to this category, but the GenTable function claims that 335 genes are annotated and 270 of them are significant. If I dig deeper and see what those 335 genes are they do indeed appear to be sorghum gene IDs; they are all unique and contain my original 48. I assume that the topGO package is pulling GO-to-gene mappings from somewhere to find these additional genes, but I cannot find documentation for it anywhere. Does anyone know why/how topGO is locating more genes than I give per GO category and from where they are getting pulled from? If my GO annotations are not complete I would like to know, as I thought they were. Thanks for the help! Sincerely, -Diana Former USDA researcher Current USDA consultant [[alternative HTML version deleted]]

GO Category topGO GO Category topGO • 1.6k views

ADD COMMENT • link updated 11.5 years ago by Valerie Obenchain ★ 6.8k • written 11.5 years ago by Diana Toups Dugas ▴ 20

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 3.1 years ago

United States

Hi Diana, I'm not an expert at topGO but maybe I can help track this down. It sounds like for the same input, GenTable is returning many more genes than annFUN.gene2GO. Looking at the annFUN.gene2GO function it looks like several 'filters' are applied. Genes with no annotation are dropped and only one-to-one mappings are kept (subsetting on 'goodGO' below). I've taken the annFUN.gene2GO code from the package so we can see the comments. annFUN.gene2GO <- function(whichOnto, feasibleGenes = NULL, gene2GO) { ## GO terms annotated to the specified ontology ontoGO <- get(paste("GO", whichOnto, "Term", sep = "")) ## Restrict the mappings to the feasibleGenes set if(!is.null(feasibleGenes)) gene2GO <- gene2GO[intersect(names(gene2GO), feasibleGenes)] ## Throw-up the genes which have no annotation if(anyis.na(gene2GO))) gene2GO <- gene2GO[!is.na(gene2GO)] gene2GO <- gene2GO[sapply(gene2GO, length) > 0] ## Get all the GO and keep a one-to-one mapping with the genes allGO <- unlist(gene2GO, use.names = FALSE) geneID <- rep(names(gene2GO), sapply(gene2GO, length)) goodGO <- allGO %in% ls(ontoGO) return(split(geneID[goodGO], allGO[goodGO])) } The GeneTable code has no such filtering. The function is long so I won't post the full code here. You can see it with, > selectMethod("GenTable", "topGOdata") Method Definition: function (object, ...) { .local <- function (object, ..., orderBy = 1, ranksOf = 2, topNodes = 10, numChar = 40, format.FUN = format.pval, decreasing = FALSE, useLevels = FALSE) { ... ... } If this is not helpful, please provide a reproducible example and I will look into it further. Valerie On 08/29/2013 12:27 PM, Diana Toups Dugas wrote: > Hi- > I'm trying to analyze some Sorghum NGS transcriptome data using topGO. The > analysis runs using my custom geneID2GO object and the annFUN.gene2GO > function. When I run GenTable, however, the results are strange. For > example, for the GO:0006970 category, the geneID2GO mappings I used only > lists 48 genes annotated to this category, but the GenTable function claims > that 335 genes are annotated and 270 of them are significant. If I dig > deeper and see what those 335 genes are they do indeed appear to be sorghum > gene IDs; they are all unique and contain my original 48. > > I assume that the topGO package is pulling GO-to-gene mappings from > somewhere to find these additional genes, but I cannot find documentation > for it anywhere. Does anyone know why/how topGO is locating more genes > than I give per GO category and from where they are getting pulled from? > If my GO annotations are not complete I would like to know, as I thought > they were. > > Thanks for the help! > Sincerely, > -Diana > Former USDA researcher > Current USDA consultant > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 11.5 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.