Hello,
I'm using GOstats (version 2.42.0) to perform GO enrichment analysis in a non-model organism.
The input data was obtained from the DE analysis of RNA-Seq data. For the analysis I used an own
gene universe, as well as a sub-list of this genes to run the hypergeometric test. The function
hyperGtest() returned 25 enriched GO-terms. Looking closer to the results I realised that several
GO-terms were not present in my gene universe. How can that happen? Does anyone have an
explanation for that?
Best regards,
Mihaela
You will have to provide a self-contained example to show the problem for anybody to help.
Sorry for the late response. Here is an example for one of the enriched GO-terms which are not in my universe list:
The Go-term GO:0050136 has a count of 3 genes, namely 808221, 808224, and 808225 (entrezIds). In my universe the GO-terms corresponding to this genes are: GO:0016021, GO:0005747, GO:0008137 (for 808221); GO:0016021, GO:0005747, GO:0008137, GO:0042773 (for 808224); and GO:0016021, GO:0005747, GO:0008137, GO:0006120 (for 808225). The GO-term "GO:0050136" is not present at all in my entire universe.
Just to be sure that those enriched GO-terms which are present in my universe are also those matching the counted genes, I run a test for the GO-term GO:0008137 and everything was fine.
To run the GO-term I have used the following code:
library(AnnotationDbi)
library(biomaRt)
library(GOstats)
library(plyr)
library("GSEABase")
library(AnnotationHub)
sigDE_file <- "list_sigDE_genes.csv"
sig.set <- read.table(file = sigDE_file, header = TRUE,sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA", " NA"))
sig.set <- sig.set[!is.na(sig.set$entrez),]
sig.set$entrez <- as.character(sig.set$entrez)
sig.ids <- sig.set$entrez
univ_file <- "list_allDE_genes.csv"
univ <- read.table(file = univ_file, header = TRUE,quote = "", sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA"," NA"))
univ <- univ[complete.cases(univ$entrez), ]
univ <- univ[!duplicated(univ$entrez), ]
ah <- AnnotationHub()
orgs <- subset(ah,ah$rdataclass == "OrgDb")
query(orgs, "Oryctolagus.cuniculus")
rabbit <- orgs[["AH55793"]]
go_ids <- AnnotationDbi::select(rabbit, keys = as.character(univ$entrez), keytype = "ENTREZID", columns = c("GO","EVIDENCE","ENTREZID","SYMBOL"))
go_ids <- go_ids[,c(2,3,1,4)]
go_ids <- go_ids[complete.cases(go_ids$GO),]
universe <- unique(go_ids$ENTREZID)
my.frame <- data.frame(cbind(GO=go_ids$GO,EVIDENCE=go_ids$EVIDENCE,ENTREZID=go_ids$ENTREZID))
goFrame <- GOFrame(my.frame, organism = "Oryctolagus.cuniculus")
goAllFrame <- GOAllFrame(goFrame)
gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params", geneSetCollection = gsc, geneIds = sig.ids , universeGeneIds = universe, ontology = "MF", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
mf_over <- hyperGTest(params)
mf_over_results <- summary(mf_over)
all <-geneIdUniverse(mf_over)[["GO:0050136"]]
"808221" "808222" "808223" "808224" "808225" "808226" "808230" "100341941" "100344233""100353024" "100358684"
x <-geneIds(mf_over)[geneIds(mf_over) %in% all]
"808221" "808224" "808225"
I can provide the gene list and the universe list, too. I'm just not sure how to attach it.