Entering edit mode
Tim Smith
★
1.1k
@tim-smith-1532
Last seen 10.2 years ago
Thanks Jim. I'm still having problems, i.e., I cannot find which
subset of input genes resulted in the significant GO term. I have
reproduced the problem that I am having:
-----------------------------------------------------
library(org.Hs.eg.db)
library(GOstats)
library(GO.db)
# Set of genes (Entrez Ids) that I want to investigate
geneIds <- c("10406", "10418", "11082", "1281", "1410",
"157506", "167410", "1906" , "2029", "23091", "2625" , "2823" ,
"2877", "2993", "3039" , "3043",
"3046", "3283", "3553", "4015", "4069", "4258", "4345",
"4353", "4885", "4886", "5021", "5055", "5151", "5156",
"5320", "5553",
"55885", "56667", "5788" , "5999", "6005", "629", "6507",
"653145", "6590", "673", "6876", "695", "7026", "7070",
"7103", "7412",
"760", "7738", "800", "828", "83890", "945", "963")
### I have reproduced Jim's code for the test
univ <- Lkeys(org.Hs.egGO)
param <- new("GOHyperGParams", geneIds = geneIds,
universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP")
hyp <- hyperGTest(param)
summary(hyp,categorySize=2000)
----------------------------------------------------
The result I get is :
GOBPID Pvalue OddsRatio ExpCount Count Size
Term
1 GO:0032501 0.0001568145 3.021968 11.976486 24 3438
multicellular organismal process
2 GO:0032502 0.0012459810 2.626265 10.297409 20 2956
developmental process
3 GO:0007275 0.0051457709 2.457897 7.517527 15 2158 multicellular
organismal development
4 GO:0007154 0.0061081457 2.187407 13.418681 22 3852
cell communication
I now want to know which subset of genes resulted in 'GO:0032501'.
The error I get is:
> geneIds[geneIds %in% get("GO:0032501", revmap(org.Hs.egGO))]
Error in .checkKeys(value, Rkeys(x), x@ifnotfound) :
value for "GO:0032501" not found
> geneIds[geneIds %in% get("GO:0032502", revmap(org.Hs.egGO))]
Error in .checkKeys(value, Rkeys(x), x@ifnotfound) :
value for "GO:0032502" not found
[[elided Yahoo spam]]
thanks a lot.
Tim
Hi Tim,
Yeah, probeSetSummary() is probably not what you want, if you are not
starting with an Affy chip. There are some gymnastics required to map
things back to the original Affy chip that you won't need to do. In
addition, if you are not using a conditional hypergeometric analysis,
it
should be pretty simple to get what you want without even needing to
parse things out of the GOHyperGResult object. An example:
## fake up some data
> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)]
> univ <- Lkeys(org.Hs.egGO)
> param <- new("GOHyperGParams", geneIds = geneIds,
universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP")
> hyp <- hyperGTest(param)
> summary(hyp, categorySize=10)
GOBPID Pvalue OddsRatio ExpCount Count Size
Term
1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single
fertilization
2 GO:0009566 0.002925855 28.16374 0.08097501 2 56
fertilization
So we have two terms of interest. Getting the Entrez Gene IDs from the
input set that map to these terms is easy:
> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))]
[1] "100131137" "10007"
Now you might also want to know which 54 Entrez Gene IDs map to that
particular GO term. Since you are not conditioning, this includes that
particular GO term and all its offspring.
> offspring <- get("GO:0007338", GOBPOFFSPRING)
> egids <- unique(unlist(mget(c("GO:0007338", offspring),
revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE))
> egids[!is.na(egids)]
[1] "1047" "4179" "4240" "4486" "4809"
"5016"
[7] "6674" "7783" "7784" "7802" "7993"
"8747"
[13] "8748" "8852" "9082" "10007" "10361"
"22917"
[19] "26476" "53340" "57055" "57829" "64100"
"93185"
[25] "158062" "442868" "100131137" "49" "410"
"2683"
[31] "3010" "4184" "6677" "7142" "7455"
"8857"
[37] "11055" "124626" "2054" "2741" "10343"
"10566"
[43] "27297" "152015" "3074" "167" "928"
"2515"
[49] "5104" "23553" "284359" "164684" "7141"
"79400"
Best,
Jim
Tim Smith wrote:
Thanks James. If I can tweak that function, I'll get exactly what I
want.
I tried what you suggested and got the following error:
---------------------------
### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
ps <- probeSetSummary(GO)
Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
variable "org.Hs.egENTREZID" was not found
--------------------------------
I guess the function would return the probe ids if I was using them,
but I have Entrez IDs as input.
Or am I doing something wrong?
thanks!
----- Original Message ----
From: James W. MacDonald <jmacdon@med.umich.edu>
Cc: bioc <bioconductor@stat.math.ethz.ch>
Sent: Wednesday, October 22, 2008 9:10:39 AM
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Hi Tim,
Does probeSetSummary() do what you want?
Best,
Jim
Tim Smith wrote:
Hi,
I
was performing a hyperGTest for genes in homo-sapiens. For a set of
input genes, this function returns some 'significant' GO terms. What I
wanted to now do was to co-relate each significant GO term (returned
by
this function) with genes (from my set of input genes) associated with
that GO term. However, I think that I may be using the wrong
package/function to get the releveant set of genes.
Currently, what I'm doing is finding the significant GO terms by using
the following code:
-----------------------
### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
--------------------------
This
gives me a set of significant GO terms. Now, I would like to find
which
subset of genes in 'genes1' is associated with each of the significant
GO term. To do this I map all GO terms to their Entrez IDs using the
'org.Hs.eg.db' package using the following:
xx <- as.list(org.Hs.egGO2EG)
to
get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
this number small?) that map to at least one Entrez ID. So, from here
I
look up which Entrez IDs are associated with my GO term of interest.
My
problem is that often, the GO term from hyperGTest is not associated
with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
above ), i.e. the GO term/ID is not in the list obtained from
'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
hyperGTest, but does not appear to be associated with any Entrez IDs
in
the org.Hs.eg.db package. Where could I be going wrong?
[[elided Yahoo spam]]
Thanks for any comments/suggestions. I realize that I'm probably doing
something really stupid here....
My sessionInfo() is:
--------------------------------
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United
States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] grid splines tools stats graphics grDevices utils
datasets methods base
other attached packages:
[1]
gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0
xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0
AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4
genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22]
preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
loaded via a namespace (and not attached):
[1] cluster_1.11.11 MASS_7.2-44
---------------------------------
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
[[alternative HTML version deleted]]