Hi,
I have my TopGO results table, and I want to pull out the genes that are annotated to a specific GO term, I only want the significant genes, not all the genes annotated to the term. This is all my code to get the GO enrichment analysis results.
library(topGO)
go.mappings <- readMappings(file = "ALL_go_terms")
all_genes <- names(go.mappings)
#AGA
aga_genes <- read.csv("treat_vs_control_aga.csv", header = TRUE)
aga_genes <- aga_genes$X
aga_genes <- as.character(aga_genes)
ngeneList_aga <- factor(as.integer(all_genes %in% aga_genes))
names(ngeneList_aga) <- all_genes
#BP
BP_GOdata <- new("topGOdata",
description = "GO analysis AGA",
ontology = 'BP',
allGenes = ngeneList_aga,
annot = annFUN.gene2GO,
gene2GO = go.mappings)
BP_resultFisher <- runTest(BP_GOdata, algorithm="classic", statistic="fisher", scoreOrder="increasing")
BP_tab <- GenTable(BP_GOdata, raw.p.value = BP_resultFisher, topNodes = length(BP_resultFisher@score),
numChar = 120)
The output of my table looks like this:
GO.ID Term Annotated Significant Expected raw.p.value
GO:0005976 polysaccharide metabolic process 232 18 3.95 6.7e-08
Again, to clarify, I only want to get the genes under 'Significant', not all the annotated genes (I've seen some online answers for that). Additionally, I would prefer to be able to pull them out using the GO term, rather than GO ID. Thanks
Sorry, I've re-read this comment but don't really follow. Yes, I know which are my significant genes, I want to extract them from the results table. Those genes don't really have anything to do with go.mappings though, go.mappings is the association of each gene with a GO ID. Could you please elaborate on what you mean?
You know which genes are significant, and you have a list of gene names, where the names of the list are the GO terms. What you want to do is to get the genes that are annotated to say GO:0005976 that are also significant. You know which genes are significant, and you know which genes are annotated to that GO term, so as Lluis said, it's just a simple subsetting operation. There are any number of ways to do that, using say
subset
or%in%
ormatch
and I am sure all of that has been recapitulated in the tidyverse as well, if that's your jam.But this is just a basic manipulation in R that you have to be able to do if you expect to do much at all with R. While R is free to download and to use, it's not actually free at all! It costs your time and effort to figure out how to get things done. You might want to read An Introduction to R, which covers the basics. You can also just Google things like 'subset vector R' or whatever. I have been using R for over 20 years now, and I probably make multiple Google queries like that every day.
My main point is that you don't need the result table of top.GO to do that. You could do something like this.
The resulting output would be the significant id of your genes and the GO ID they have. If you wanted those only for the GO ID significant you could further subset that
GO_gene_significant
using theBP_tab
table like this: