I’am running topGO, to perform GO analysis in a RNA-seq data set. I have an small data set of 104 significant genes that I called “sigG”.
Firstly, I used genefilter to find genes that have similar level of expression than my “sigG”. For each sigGene I got 10 genes which make a background set of 1040 genes (backG).
I run topGO but I found results that make suspect that something is going wrong. See the head of these table:
Using 10 background genes per sig gene:
1-For example in my last row I see 127 sig genes mapping to GO:0016740, but I only use a set of 104 genes.
2-When I include more backG (50 instead of 10 per sigG) I notice that the number of annotated genes for each GO term changed (increased for most of them)
Finally, if I took my sigGenes and I analyzed them with a variety of online GO tools I always get nice enrichment in very specific terms, independently of the tool. However not of those terms appear in this topGO analysis. I cannot get an explanation for these results. I will appreciate if someone can give me a hint of what is going one. Below you can find the R script used for this analysis
The code I used to generate these tables is:
# resSxT is a result DESeq2 object # > class(resSxT) # [1] "DESeqResults" #attr(,"package") #[1] "DESeq2" #get the list of genes considered to be significant for the interaction test sigGenesSxT <-rownames(subset(resSxT, pvalue < 0.01)) #get average expressions overallBaseMeanSxT <- as.matrix(resSxT[, "baseMean", drop = F]) backG_SxT <- genefinder(overallBaseMeanSxT, sigGenesSxT, 50, method = "manhattan") # get identified similar genes backG_SxTwithSIG <- rownames(overallBaseMeanSxT)[as.vector(sapply(backG_SxT, function(x)x$indices))] ## remove Diff Expressed genes from background backG_SxT <- setdiff(backG_SxTwithSIG, sigGenesSxT) ##RUNNING topGO on resSxT onts = c( "MF", "BP", "CC" ) geneIDs = rownames(overallBaseMeanSxT) inUniverse = geneIDs %in% c(sigGenesSxT, backG_SxT) inSelection = geneIDs %in% backG_SxT algSxT <- factor( as.integer( inSelection[inUniverse] ) ) names(algSxT) <- geneIDs[inUniverse] tab = as.list(onts) names(tab) = onts ### test all three top level ontologies for(i in 1:3){ ## prepare data tgdSxT <- new( "topGOdata", ontology=onts[i], allGenes = algSxT, nodeSize=5, annot=annFUN.org, mapping="org.Hs.eg.db", ID = "SYMBOL" ) ## run tests resultTopGO_SxT.elim <- runTest(tgdSxT, algorithm = "elim", statistic = "Fisher" ) resultTopGO_SxT.classic <- runTest(tgdSxT, algorithm = "classic", statistic = "Fisher" ) ## look at results tab[[i]] <- GenTable( tgdSxT, Fisher.elim = resultTopGO_SxT.elim, Fisher.classic = resultTopGO_SxT.classic, orderBy = "Fisher.classic" , topNodes = 200) } topGOResultsSxT <- as.data.frame(do.call(rbind,tab)) write.csv(topGOResultsSxT, file = "SxT_topGOResults.csv")