Hi
I'm using GOstats, I've created the universe for the Saccharomyces cerevisiae
frame <- toTable(org.Sc.sgdGO) goframeData <- data.frame(frame$go_id, frame$Evidence, frame$systematic_name) goFrame <- GOFrame(goframeData, organism = "Saccharomyces cerevisiae") goAllFrame <- GOAllFrame(goFrame) Sc.gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) setwd("/path/") save(Sc.gsc, file = "Sc.gsc.Rda")
Now, if I'm not wrong I must select the universe of genes present at my experiment. I have a list of characters with the genes that are present in my experiment and have a value (this means the value is not NA). The problem is that I'm not being able to properly filter the genes on my experiment (variable filtered_universe).
universe = Lkeys(org.Sc.sgdGO) genes = universe[1:500] params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params", + geneSetCollection=Sc.gsc, + geneIds = filtered_universe, + universeGeneIds = universe, + ontology = "MF", + pvalueCutoff = 0.05, + conditional = FALSE, + testDirection = "over")
I've tried this, but I'm getting an error on the last line, and I'm not really sure this is how this should be done. gso is the original unverse, and fn0 is the list of genes
subsettingGeneSet <- function(gs0, fn0){ geneIds(gs0) <- geneIds(gs0)[is.element(geneIds(gs0), fn0)] } gsc2 <- sapply(Sc.gsc, subsettingGeneSet, fn0 = alk_name) gsc2 <- GeneSetCollection(gsc2)
I have understood that your universe of genes (geneIds, the ones you analyzed from your experiment and are statiscally significative) , should match the universe of genes (universeGeneIds) so the statistics work properly.
In this case the universe is the whole universe of gene from Yeast, instead of the universe of genes that are present in your experiment. Is this right?
No. The "universe" you use in GO enrichment analyses is the set of genes your assay could have potentially measured.
Imagine if you ran an experiment that used a targeted assay which (for whatever reason) only measured expression/whatever of the kinases in a given organism. If you set the Universe to include all the possible genes from your target organism, then no matter what actually happens in the experiment, you're results would always return significant hits for things related to kinase activity.