Entering edit mode
Hi,
I've been trying to do some non-run-of-the-mill gene ontology analysis
and have been messing around with the innards of the results we get
from GOstats::hyperGTest.
As a result, I've found some peculiarities in the GO annotations I'm
retrieving from the hyperGTest and how they relate to the GO
annotations in my organism's (saccharomyces cerevisiae) annotation
database.
In particular: for some enriched GO term X, I'm finding genes listed
in the geneIdUniverse of X that do not have X as a member of their
organism.db.GO map, and I'm not sure how that could be. I'm even
setting condition=FALSE to my hyperGTest to make it as "vanilla" as
possible (but as far as I understand the conditional test, this
wouldn't effect what I'm seeing anyway since genes are removed from GO
terms, and not added).
I have a specific example below. You'll find that for the first
significant GOID found (GO:0007059), gene "YKL049C" is listed as a
member of this GOID's geneIdUniverse as it is returned from the
hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping
(org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF
map.
=== Example ===
library(GOstats)
library(org.Sc.sgd.db)
library(annotate)
orfs <- c(
"YGR084C", "YDR409W", "YFR017C", "YDR342C", "YBR152W",
"YIL014W", "YGR134W", "YDL109C", "YDL234C", "YDR524C",
"YBR002C", "YGR063C", "YDR034C-A", "YIR011C",
"YHR175W", "YHR109W", "YGR207C", "YCR023C", "YER039C-A",
"YGL064C", "YGR249W", "YHR172W", "YGL226W", "YAL064W",
"YDR309C", "YDR473C", "YAL058W", "YHL030W", "YKL101W",
"YJL122W", "YBR121C", "YIL018W", "YDR443C", "YBR069C",
"YBR011C", "YHR009C", "YGL166W", "YIL103W", "YDR206W",
"YBL031W", "YBR162W-A", "YFL018C", "YBR214W", "YIL076W",
"YBR291C", "YKL049C", "YIL149C", "YDR034W-B", "YDR395W")
orf2go <- getAnnMap('GO', 'org.Sc.sgd')
go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd')
all.orfs <- keys(orf2go)
## Do GOstats test to get significant GO ontologies
params <- new("GOHyperGParams", geneIds=orfs,
universeGeneIds=all.orfs, annotation='org.Sc.sgd',
ontology='BP', pvalueCutoff=0.05,
conditional=FALSE, testDirection='over')
GO <- hyperGTest(params)
## Look at the genes annotated in the universe for each GOID and
compare
## with annotations for each gene in org.Sc.sgdGO
universes <- geneIdUniverse(GO)
uid <- names(universes)[1] ## GO:0007059
uid.members <- universes[[uid]]
"YKL049C" %in% uid.members ## TRUE
## Is the gene listed in GOID's (uid) universe annotated with that
GOID?
## Given previous result, this *should* be TRUE
go.ids <- orf2go[["YKL049C"]]
uid %in% names(go.ids) ## FALSE
## Does that GOID have this gene as its member?
## This also should be TRUE
check.orfs <- go2orf[[uid]]
"YKL049C" %in% check.orfs ## FALSE
======================================
Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term
`uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but
it's not.
Can someone shed some light on this for me?
sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GO.db_2.4.1 annotate_1.26.1 org.Sc.sgd.db_2.4.1
GOstats_2.14.0
[5] RSQLite_0.9-2 DBI_0.2-5 graph_1.26.0
Category_2.14.0
[9] AnnotationDbi_1.10.2 Biobase_2.8.0 devtools_0.1
stringr_0.4
[13] roxygen_0.1-2 profr_0.1.1 digest_0.4.2
testthat_0.3
loaded via a namespace (and not attached):
[1] GSEABase_1.10.0 RBGL_1.24.0 XML_3.1-1
evaluate_0.3 genefilter_1.30.0
[6] plyr_1.1 splines_2.11.1 survival_2.35-8
tools_2.11.1 xtable_1.5-6
Thanks,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact