Question

GO-terms delivered by GOstats hyperGTest() do not exist in my universe

0

Entering edit mode

martis.mihaela • 0

@martismihaela-14359

Last seen 7.4 years ago

Hello,

I'm using GOstats (version 2.42.0) to perform GO enrichment analysis in a non-model organism.

The input data was obtained from the DE analysis of RNA-Seq data. For the analysis I used an own

gene universe, as well as a sub-list of this genes to run the hypergeometric test. The function

hyperGtest() returned 25 enriched GO-terms. Looking closer to the results I realised that several

GO-terms were not present in my gene universe. How can that happen? Does anyone have an

explanation for that?

Best regards,

Mihaela

GOstats • 2.0k views

ADD COMMENT • link 7.4 years ago martis.mihaela • 0

0

Entering edit mode

You will have to provide a self-contained example to show the problem for anybody to help.

ADD REPLY • link 7.4 years ago James W. MacDonald 68k

0

Entering edit mode

Sorry for the late response. Here is an example for one of the enriched GO-terms which are not in my universe list:

The Go-term GO:0050136 has a count of 3 genes, namely 808221, 808224, and 808225 (entrezIds). In my universe the GO-terms corresponding to this genes are: GO:0016021, GO:0005747, GO:0008137 (for 808221); GO:0016021, GO:0005747, GO:0008137, GO:0042773 (for 808224); and GO:0016021, GO:0005747, GO:0008137, GO:0006120 (for 808225). The GO-term "GO:0050136" is not present at all in my entire universe.

Just to be sure that those enriched GO-terms which are present in my universe are also those matching the counted genes, I run a test for the GO-term GO:0008137 and everything was fine.

To run the GO-term I have used the following code:

library(AnnotationDbi)
library(biomaRt)
library(GOstats)
library(plyr)
library("GSEABase")
library(AnnotationHub)

sigDE_file     <- "list_sigDE_genes.csv"
sig.set        <- read.table(file = sigDE_file, header = TRUE,sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA", " NA"))
sig.set        <- sig.set[!is.na(sig.set$entrez),]
sig.set$entrez <- as.character(sig.set$entrez)
sig.ids        <- sig.set$entrez

univ_file      <- "list_allDE_genes.csv"
univ           <- read.table(file = univ_file, header = TRUE,quote = "", sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA"," NA"))
univ <- univ[complete.cases(univ$entrez), ]
univ <- univ[!duplicated(univ$entrez), ]

ah        <- AnnotationHub()
orgs      <- subset(ah,ah$rdataclass == "OrgDb")
query(orgs, "Oryctolagus.cuniculus")
rabbit    <- orgs[["AH55793"]]
go_ids    <- AnnotationDbi::select(rabbit, keys = as.character(univ$entrez), keytype = "ENTREZID", columns = c("GO","EVIDENCE","ENTREZID","SYMBOL"))

go_ids <- go_ids[,c(2,3,1,4)]
go_ids <- go_ids[complete.cases(go_ids$GO),]

universe <- unique(go_ids$ENTREZID)
my.frame <- data.frame(cbind(GO=go_ids$GO,EVIDENCE=go_ids$EVIDENCE,ENTREZID=go_ids$ENTREZID))
goFrame <- GOFrame(my.frame, organism = "Oryctolagus.cuniculus")
goAllFrame <- GOAllFrame(goFrame)
gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())

params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params", geneSetCollection = gsc, geneIds = sig.ids , universeGeneIds = universe, ontology = "MF", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
mf_over <- hyperGTest(params)
mf_over_results <- summary(mf_over)

all <-geneIdUniverse(mf_over)[["GO:0050136"]]
"808221" "808222" "808223" "808224" "808225" "808226" "808230" "100341941" "100344233""100353024" "100358684"
x <-geneIds(mf_over)[geneIds(mf_over) %in% all]
"808221" "808224" "808225"

I can provide the gene list and the universe list, too. I'm just not sure how to attach it.

ADD REPLY • link 7.4 years ago martis.mihaela • 0

score 0 · Answer 1 · 2017-11-10

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 4 months ago

United States

The reason you get GO terms in the output that did not seem to be in the input universe is because the input universe only contains specific annotated GO terms. The part of your code:

goAllFrame <- GOAllFrame(goFrame)

actually pulls in all parental terms of the specific annotated GO terms. Remember GO is hierarchical in nature and any gene that gets specifically annotated with one GO term will automatically also get all the parental terms. In your example, GO:0005747 is a child term of GO:0050136 . See http://amigo.geneontology.org/amigo/term/GO:0005747 and click on the "Inferred Tree View"

ADD COMMENT • link 7.4 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Thank you very much for the answer. That explains everything. Is there a way to pool only those annotated in my own universe? /Mihaela On 10 Nov 2017 6:16 pm, "Jenny Drnevich [bioc]" <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Jenny Drnevich <https: support.bioconductor.org="" u="" 2812=""/> wrote Comment: > GO-terms delivered by GOstats hyperGTest() do not exist in my universe > <https: support.bioconductor.org="" p="" 102812="" #102858="">: > > The reason you get GO terms in the output that did not seem to be in the > input universe is because the input universe only contains specific > annotated GO terms. The part of your code: > > goAllFrame <- GOAllFrame(goFrame) > > actually pulls in all parental terms of the specific annotated GO terms. > Remember GO is hierarchical in nature and any gene that gets specifically > annotated with one GO term will automatically also get all the parental > terms. In your example, *GO:0005747 *is a child term of *GO:0050136* . > See http://amigo.geneontology.org/amigo/term/GO:0005747 and click on the > "Inferred Tree View" > ------------------------------ > > Post tags: GOstats > > You may reply via email or visit https://support.bioconductor. > org/p/102812/#102858 >

ADD REPLY • link 7.4 years ago martis.mihaela • 0

0

Entering edit mode

I wouldn't necessarily want to remove all the parental terms because they could find higher level terms that have several different child terms. However, to reduce the redundancy in your testing, do GSEAGOHyperGParams(conditional = TRUE) . This will test GO:0005747 first, and if it is significant then it will remove those 3 genes from the counts for all it's parental terms.

ADD REPLY • link 7.4 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Thanks On 10 Nov 2017 6:28 pm, "Jenny Drnevich [bioc]" <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Jenny Drnevich <https: support.bioconductor.org="" u="" 2812=""/> wrote Comment: > GO-terms delivered by GOstats hyperGTest() do not exist in my universe > <https: support.bioconductor.org="" p="" 102812="" #102860="">: > > I wouldn't necessarily want to remove all the parental terms because they > could find higher level terms that have several different child terms. > However, to reduce the redundancy in your testing, do GSEAGOHyperGParams(conditional > = TRUE) . This will test GO:0005747 first, and if it is significant then > it will remove those 3 genes from the counts for all it's parental terms. > ------------------------------ > > Post tags: GOstats > > You may reply via email or visit https://support.bioconductor. > org/p/102812/#102860 >

ADD REPLY • link 7.4 years ago martis.mihaela • 0