Question

How to retreive topGO significant IDs of genes after enrichment test ?

0

Entering edit mode

David ROUX ▴ 20

@david-roux-11055

Last seen 5.8 years ago

France (Avignon University)

Hello, I ran a topGO enrichement on my RNAseq data, the output looks:

> allRes
        GO.ID                                        Term Annotated Significant Expected classicFisher
1  GO:0043531                                 ADP binding       406          21     4.05       6.5e-10
2  GO:0016760 cellulose synthase (UDP-forming) activit...        29           3     0.29        0.0029
3  GO:0015035 protein disulfide oxidoreductase activit...        71           4     0.71        0.0055
4  GO:0004089              carbonate dehydratase activity        17           2     0.17        0.0122
5  GO:0016765 transferase activity, transferring alkyl...        59           3     0.59        0.0210

Example: the first line shows 406 annotated and 21 significant genes.

According to the vignette, the topGO “sigGenes()” function appears to retrieve only the annotated genes. Here it fetches the 406 annoted genes’IDs.

The vignette then proposes to use the “printGenes()” function, but “only when the chip used has an annotation package available in Bioconductor”.

Here we are working on Prunus persicae with no available package. So, how can we get the 21 significant genes IDs in my example?

Many thanks in advance.

topGO significant enrichment genes IDs • 1.3k views

ADD COMMENT • link 5.8 years ago David ROUX ▴ 20

score 1 · Answer 1 · 2019-01-25

I am answering my own question (in case it will help someone later). :-)

I found the solution from other topics elsewhere (https://support.bioconductor.org/p/65856/ and https://www.biostars.org/p/239032/ ).

A simple way is to re-use the “genesOfInterest” list created earlier in the topGO pipeline i.e. here:

geneListTemp <- read.csv("Diff_Express_Genes_liste.csv",header=TRUE) 
genesOfInterest <- geneListTemp[,1]

Later, according to the topGO vignette, we do:

topGO_results <- GenTable(myGOdata, etc… )

And finally with the following statement, we can produce the list of significant genes IDS for each significant GO node highlighted via GenTable():

topGO_results$genes <- sapply(topGO_results$GO.ID, function(x)
{
  genes<-genesInTerm(myGOdata, x) 
  genes[[1]][genes[[1]] %in% genesOfInterest]
})
View(topGO_results)

And these last lines will produce a nice looking CVS table !

topGO_results$genes = as.character(topGO_results$genes)
topGO_results$genes <- gsub("[c()]","",topGO_results$genes)
topGO_results$genes <- gsub("[)]","",topGO_results$genes)
topGO_results$genes <- gsub("[\"]","",topGO_results$genes)
topGO_results <- as.data.frame(topGO_results)
write.csv(topGO_results, file = "out.csv")

Best.