Entering edit mode
Christine Gläßer
▴
60
@christine-glaer-6291
Last seen 10.2 years ago
Dear all,
I'm analyzing a set of genes for GO enrichment, given a customized
annotation. In that annotation, a specific GO term occurs only once;
in
my set of genes, the gene attached to it is also represented.
I would have expected to find the numbers -- genes having a specific
GO
term in my list being "1" and genes having the term in total, in the
annotation, being "1" as well -- in "count" and "size", however, 13
and
62 genes are denoted. Please have a look at my script and the session
info:
SessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C
[3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8
[5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=de_DE.utf8
[7] LC_PAPER=de_DE.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GO.db_2.9.0 GSEABase_1.22.0 annotate_1.38.0
[4] GOstats_2.26.0 RSQLite_0.11.4 DBI_0.2-7
[7] graph_1.38.3 Category_2.26.0 AnnotationDbi_1.22.6
[10] Biobase_2.20.1 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] AnnotationForge_1.2.2 genefilter_1.42.0 IRanges_1.18.4
[4] RBGL_1.36.2 splines_3.0.2 stats4_3.0.2
[7] survival_2.37-7 tcltk_3.0.2 tools_3.0.2
[10] XML_3.98-1.1 xtable_1.7-1
## Script
library(GOstats)
library(GSEABase)
library(GO.db)
ANNO_FILE <- "genes_anno_forgostats.txt"
UNIVERSE_FILE <-"universe.txt"
ORG <- "MY_Organism"
GENE_FILE<-"targets.txt"
OUT_FILE<-"results.txt"
goframeData <- read.table(ANNO_FILE,header=T)
universe <- scan(file=UNIVERSE_FILE,what="character")
goFrame = GOFrame(goframeData, organism = ORG)
goAllFrame = GOAllFrame(goFrame)
gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
genes <- scan(file=GENE_FILE,what="character")
NAMESPACE="MF"
x = length(NAMESPACE)
params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc,
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE,
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")
OverMF <- hyperGTest(params)
NAMESPACE="BP"
x = length(NAMESPACE)
params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc,
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE,
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")
OverBP <- hyperGTest(params)
NAMESPACE="CC"
x = length(NAMESPACE)
params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc,
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE,
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")
OverCC <- hyperGTest(params)
write.table(summary(OverMF),file=OUTFILE,quote=F,sep='\t', append=T)
write.table(summary(OverBP),file=OUTFILE,quote=F,sep='\t', append=T)
write.table(summary(OverCC),file=OUTFILE,quote=F,sep='\t', append=T)
If I run GOStats unconditionally, other GO terms being parent terms to
the one I'm looking at also pop up. The numbers (count, size) are the
same for all terms; my assumption is that genes with child terms
belonging to a specific parent term are summed up in count/size, but
I'm
absolutely not sure if this is the case... Could you please help me
figuring out what is denoted by the numbers in "counts" and "size", if
not the absolute numbers like I would have expected?
Kind regards, and many thanks in advance,
Christine Gläßer
[[alternative HTML version deleted]]