Question

How to get significant genes annotated to go term using TopGO?

0

Entering edit mode

Lucía ▴ 30

@16997962

Last seen 2.8 years ago

Canada

Hi,

I have my TopGO results table, and I want to pull out the genes that are annotated to a specific GO term, I only want the significant genes, not all the genes annotated to the term. This is all my code to get the GO enrichment analysis results.

library(topGO)

go.mappings <- readMappings(file = "ALL_go_terms")

all_genes <- names(go.mappings) 

#AGA
aga_genes <- read.csv("treat_vs_control_aga.csv", header = TRUE)

aga_genes <- aga_genes$X

aga_genes <- as.character(aga_genes)

ngeneList_aga <- factor(as.integer(all_genes %in% aga_genes))

names(ngeneList_aga) <- all_genes

#BP
BP_GOdata <- new("topGOdata",  
              description = "GO analysis AGA", 
              ontology = 'BP', 
              allGenes = ngeneList_aga,
              annot = annFUN.gene2GO,
              gene2GO = go.mappings)

BP_resultFisher <- runTest(BP_GOdata, algorithm="classic", statistic="fisher", scoreOrder="increasing")

BP_tab <- GenTable(BP_GOdata, raw.p.value = BP_resultFisher, topNodes = length(BP_resultFisher@score),
                numChar = 120)

The output of my table looks like this:

GO.ID                    Term                            Annotated       Significant         Expected           raw.p.value

GO:0005976         polysaccharide metabolic process       232                18               3.95              6.7e-08

Again, to clarify, I only want to get the genes under 'Significant', not all the annotated genes (I've seen some online answers for that). Additionally, I would prefer to be able to pull them out using the GO term, rather than GO ID. Thanks

topGO • 4.2k views

ADD COMMENT • link updated 2.8 years ago by Lluís Revilla Sancho ▴ 760 • written 2.8 years ago by Lucía ▴ 30

score 0 · Answer 1 · 2022-07-08

0

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 5 weeks ago

European Union

The significant genes count are those that you marked as so in ngeneList_aga . You just need to subset the list to those genes that are annotated to the GO ID 005976, for example on go.mappings.

ADD COMMENT • link 2.8 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Sorry, I've re-read this comment but don't really follow. Yes, I know which are my significant genes, I want to extract them from the results table. Those genes don't really have anything to do with go.mappings though, go.mappings is the association of each gene with a GO ID. Could you please elaborate on what you mean?

ADD REPLY • link 2.8 years ago Lucía ▴ 30

0

Entering edit mode

You know which genes are significant, and you have a list of gene names, where the names of the list are the GO terms. What you want to do is to get the genes that are annotated to say GO:0005976 that are also significant. You know which genes are significant, and you know which genes are annotated to that GO term, so as Lluis said, it's just a simple subsetting operation. There are any number of ways to do that, using say subset or %in% or match and I am sure all of that has been recapitulated in the tidyverse as well, if that's your jam.

But this is just a basic manipulation in R that you have to be able to do if you expect to do much at all with R. While R is free to download and to use, it's not actually free at all! It costs your time and effort to figure out how to get things done. You might want to read An Introduction to R, which covers the basics. You can also just Google things like 'subset vector R' or whatever. I have been using R for over 20 years now, and I probably make multiple Google queries like that every day.

ADD REPLY • link 2.8 years ago James W. MacDonald 68k

0

Entering edit mode

My main point is that you don't need the result table of top.GO to do that. You could do something like this.

threshold <- 0.05 # threshold for alpha = 5%
significant_genes <- names(ngeneList_aga)[ngeneList_aga < threshold]
GO_gene_significant <- go.mappings[go.mappings$gene_id %in% significant_genes, ]

The resulting output would be the significant id of your genes and the GO ID they have. If you wanted those only for the GO ID significant you could further subset that GO_gene_significant using the BP_tab table like this:

GO_gene_significant[GOID %in% BP_tab$GOID, ]

ADD REPLY • link 2.8 years ago Lluís Revilla Sancho ▴ 760