Hi,
Paul Evans wrote:
> Hi Robert,
>
> Two questions.
>
> First, does that mean that I will be able to use the org.XX packages
and
> KEGG if I download the GOstats package from the devel download page
in
> bioconductor (instead of the release version)? Alternatively, is
there
Yes, but you need to use the release candidate for R 2.7.0. In about
two
weeks R 2.7.0 will be released, so you may want to wait, and shortly
after that BioC 2.2 will come out, and at that time all of this will
work, in the "new" release branches.
> any way I can get the hyperG test to function with a set of Entrez
IDs
> only (for example, if I get data from SMD I will not have the chip
> details but only entrez ids).
Yes, of course, but then you are not using KEGG or GO or any of those
things for your gene sets, unless you do the mapping for them. You
should be able to use Entrez IDs from the SMD together with the
org.Sc.sgd.db, by simply restricting attention to those that are
contained in the org.Sc.sgd.db package.
>
> Second, I tried the same test with several affy and agilent arrays.
For
> the code given below the 'hgug4110b' package returned the same
error. I
> am reproducting the code below:
>
> --------------------------------------------------------------------
-------
> ### TEST HYPERGTEST FOR AFFY/AGILENT CHIPS##
> rm(list = ls())
> library("hgug4110b")
> library("KEGG.db")
> library("GOstats")
>
> chips <- c("hgug4110b")
> pvalue <- 1
> for(i in 1:length(chips)){
> y <- get(paste(chips[i],"ENTREZID",sep=''))
> print(chips[i])
> xx <- as.list(y)
> # Remove probe identifiers that do not map to any ENTREZID
> xx <- xx[!is.na(xx)]
> if(length(xx) > 0){
> # The ENTREZIDs for the first two elements of XX
> xx[1:2]
> # Get the first one
> xx[[1]]
> }
> allGenes <- unique(unlist(xx))
> geneUniverse <- allGenes[1:7000]
> set.seed(37688)
> ## Create random cluster of 13 genes
> geneCluster <- sample(1:7000,13,replace=F)
> geneCluster <- unique(unlist(geneUniverse[geneCluster]))
> print(geneCluster)
> paramsGO <- new("GOHyperGParams", geneIds = geneCluster,
> universeGeneIds = geneUniverse, annotation = chips[i],
> ontology = "BP",
> pvalueCutoff = pvalue, conditional = FALSE, testDirection
=
> "over")
>
> paramsKEGG <- new("KEGGHyperGParams", geneIds = geneCluster,
> universeGeneIds = geneUniverse, annotation = chips[i],
> pvalueCutoff = pvalue, testDirection = "over")
> #tryCatch(hgOverGO <- hyperGTest(paramsGO),error = function(e)
> {print('error GO')})
> tryCatch(hgOverKEGG <- hyperGTest(paramsKEGG),error = function(e)
> {print('error KEGG')})
> }
>
> -------------------------------------
> The output I get is:
>
> [1] "hgug4110b"
> [1] "4644" "55630" "BX647822" "9933" "79016" "5774"
> "7274" "6331" "51249" "55515" "AK096394" "28299"
"AF116641"
> [1] "error KEGG"
>
> i.e. for this chip I get the same error ("Error in numW - numWdrawn
:
> non-numeric argument to binary operator"). Am I doing something
wrong?
No you are not doing anything wrong, there is a bug. You will need
to
either wait for the next release (about 3 weeks), or use the devel
versions of everything.
best wishes
Robert
>
>
> regards.
>
>
>
>
>
> Hi Paul,
> Thanks for the report. Please, if you use sample also set a
seed,
> otherwise your example is not reproducible.
>
> The short answer is that you cannot use KEGG with the org.XX
packages
> in release. Based on your report I have modified the Category
package
> (which is doing most of the work), so that this now should work in
the
> devel branch, and that change should propagate in the next day or so
to
> the web (version 2.5.9).
>
> best wishes
> Robert
>
>
> Paul Evans wrote:
>
>> > Thanks Robert. I tried the KEGG.db package and tried the
>> > KEGGHyperGParams again. The code I used is:
>> >
>> > -----------------------------------------------------------------
------------
>> >
>> > ############ TEST hyperGTest for HOMO SAPIENS ######
>> > library("KEGG.db")
>> > library("GOstats")
>> > library("org.Hs.eg.db")
>> >
>> > x <- org.Hs.egACCNUM
>> > # Get the entrez gene identifiers that are mapped to an ACCNUM
>> > mapped_genes <- mappedkeys(x)
>> > geneUniverse <- mapped_genes[1:1200]
>> >
>> >
>> > ## Create random cluster of 13 genes
>> > geneCluster <- sample(1:1200,13,replace=F)
>> > geneCluster <- unique(unlist(geneUniverse[geneCluster]))
>> >
>> > print(geneCluster)
>> >
>> > paramsGO <- new("GOHyperGParams", geneIds = geneCluster,
>> > universeGeneIds = geneUniverse, annotation =
"org.Hs.eg.db",
>> > ontology = "BP",
>> > pvalueCutoff = 1, conditional = FALSE, testDirection =
"over")
>> >
>> >
>> > paramsKEGG <- new("KEGGHyperGParams", geneIds = geneCluster,
>> > universeGeneIds = geneUniverse, annotation =
"org.Hs.eg.db",
>> > pvalueCutoff = 1, testDirection = "over")
>> >
>> >
>> > tryCatch(hgOverGO <- hyperGTest(paramsGO),error = function(e)
>> > {print('error GO')})
>> > tryCatch(hgOverKEGG <- hyperGTest(paramsKEGG),error = function(e)
>> > {print('error KEGG')})
>> >
>> > -----------------------------------------------------------------
------------
>> >
>> >
>> >
>> > The output/error I got now is:
>> >
>> >
>> >
>> > [1] "901" "599" "435" "100" "1525" "25" "204" "1159" "865"
>> > "1195" "1629" "912" "998"
>> >
>> > Error in get(paste(lib, name, sep = "")) :
>> > no function to return from, jumping to top level
>> > [1] "error KEGG"
>> >
>> >
>> >
>> > My sessionInfo() is:
>> >
>> >
>> >
>> > > sessionInfo()
>> > R version 2.6.2 (2008-02-08)
>> > i386-pc-mingw32
>> >
>> > locale:
>> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> > States.1252;LC_MONETARY=English_United
>> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> >
>> > attached base packages:
>> > [1] splines tools stats graphics grDevices utils
>> > datasets methods base
>> >
>> > other attached packages:
>> > [1] org.Hs.eg.db_2.0.2 GOstats_2.4.0 Category_2.4.0
>> > genefilter_1.16.0 survival_2.34 RBGL_1.14.0
>> > annotate_1.16.1
>> > [8] xtable_1.5-2 GO.db_2.0.2 graph_1.16.1
>> > KEGG.db_2.0.2 AnnotationDbi_1.0.6 RSQLite_0.6-8
>> > DBI_0.2-4
>> > [15] Biobase_1.16.3
>> >
>> > loaded via a namespace (and not attached):
>> > [1] cluster_1.11.10
>> > >
>> >
>> >
>> >
>> > My apologies if I have missed something elementary!
>> >
>> >
>> >
>> > thanks!
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message ----
>> > From: Robert Gentleman <rgentlem at="" fhcrc.org="">
>> > To: Paul Evans <p.evans48 at="" yahoo.com="">
>> > Cc: Bioconductor at stat.math.ethz.ch
>> > Sent: Monday, March 31, 2008 3:45:11 PM
>> > Subject: Re: [BioC] GOstats - hyperGTest using "KEGGHyperGParams"
>> >
>> > Hi Paul,
>> > Thanks for the bug report, it seems that there is an issue when
all
>> > values are zero, which shows up intermittently. You can solve it
by
>> > using try or tryCatch around the call to hyperGTest. You can
simply use
>> > a p-value of 1, which is what it will be.
>> >
>> > You should not be loading the GO package for this (KEGG if
anything, and
>> > even then, please use KEGG.db, not KEGG).
>> >
>> > I will fix the bug, but given how close the release is I won't
back
>> > port it, and it will only be available in the devel branch (soon
to be
>> > the release branch),
>> >
>> > best wishes
>> > Robert
>> >
>> > Paul Evans wrote:
>> > > Hi all,
>> > >
>> > > I was trying to understand the hyperGTest for KEGG, and used
the
>> > following code:
>> > >
>> > >
>> > -----------------------------------------------------------------
------------------------------------------
>> > > ## TEST HYPERGTEST FOR KEGG
>> > >
>> > > library("YEAST")
>> > > library("GOstats")
>> > > library("GO")
>> > >
>> > > # Convert to a list
>> > > xx <- as.list(YEASTGENENAME)
>> > > # Remove probes that do not map to any GENENAME
>> > > xx <- xx[!is.na <http: is.na=""/>(xx)]
>> > > if(length(xx) > 0){
>> > > # Gets the gene names for the first five probe identifiers
>> > > xx[1:5]
>> > > # Get the first one
>> > > xx[[1]]
>> > > }
>> > >
>> > > ## Create gene universe
>> > > allGenes <- names(xx)
>> > > print(length(allGenes))
>> > > geneUniverse <- allGenes[1:800]
>> > > for(i in 1:20){
>> > > ## Create random cluster of 13 genes
>> > > geneCluster <- sample(1:800,13,replace=F)
>> > > geneCluster <- geneUniverse[geneCluster]
>> > > print(i)
>> > > print(geneCluster)
>> > > params <- new("KEGGHyperGParams", geneIds = geneCluster,
>> > > universeGeneIds = geneUniverse, annotation = "YEAST",
>> > > pvalueCutoff = 0.1, testDirection = "over")
>> > > hgOver <- hyperGTest(params)
>> > > dfrm <- summary(hgOver)
>> > > #print(dfrm)
>> > > }
>> > >
>> > >
>> > -----------------------------------------------------------------
---------------------------------------
>> > >
>> > > The output/error that I got is:
>> > >
>> > > [1] 1
>> > > [1] "YKR067W" "MOF9" "YDR518W" "YPR074C" "YCL011C"
"YCR069W"
>> > "YDL104C" "YGR136W" "YAR003W" "YFR013W" "YOR116C" "YDR507C"
"YGR167W"
>> > > [1] 2
>> > > [1] "YJR112W" "CEN8" "YPL005W" "YHR081W" "YLR323C"
"YBR131W"
>> > "YLR347C" "YHR098C" "YOR107W" "YCL027W" "YNR012W" "CRL16"
"YLR329W"
>> > > [1] 3
>> > > [1] "YNL327W" "YEL056W" "YNL321W" "YDL111C" "YMR284W"
"YLR338W"
>> > "YPL008W" "CRL17" "YEL065W" "YFR027W" "YMR269W" "YPL019C"
"YML038C"
>> > > Error in numW - numWdrawn : non-numeric argument to binary
operator
>> > >
>> > >
>> > > [[elided trailing spam]]
>> > >
>> > > My sessionInfo():
>> > >
>> > >> sessionInfo()
>> > > R version 2.6.2 (2008-02-08)
>> > > i386-pc-mingw32
>> > > locale:
>> > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> > States.1252;LC_MONETARY=English_United
>> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> > > attached base packages:
>> > > [1] splines tools stats graphics grDevices utils
datasets
>> > methods base
>> > > other attached packages:
>> > > [1] KEGG_2.0.1 GOstats_2.4.0 Category_2.4.0
>> > genefilter_1.16.0 survival_2.34 RBGL_1.14.0
GO.db_2.0.2
>> > > [8] graph_1.16.1 goTools_1.10.0 annotate_1.16.1
>> > xtable_1.5-2 AnnotationDbi_1.0.6 RSQLite_0.6-8
DBI_0.2-4
>> >
>> > > [15] Biobase_1.16.3 GO_2.0.1 hu6800_2.0.1
>> > hgu95a_2.0.1 hgu95av2_2.0.1 hgu133plus2_2.0.1
>> > hgu133b_2.0.1
>> > > [22] hgu133a_2.0.1 som_0.3-4 YEAST_2.0.1
>> > cluster_1.11.10
>> > >
>> > >
>> > > thanks!
>> > >
>> > >
>> > >
>> > _________________________________________________________________
___________________
>> > > Looking for last minute shopping deals?
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > _______________________________________________
>> > > Bioconductor mailing list
>> > > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch="">
>> > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > Search the archives:
>> >
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >
>> >
>> > --
>> > Robert Gentleman, PhD
>> > Program in Computational Biology
>> > Division of Public Health Sciences
>> > Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N, M2-B876
>> > PO Box 19024
>> > Seattle, Washington 98109-1024
>> > 206-667-7700
>> > rgentlem at fhcrc.org <mailto:rgentlem at="" fhcrc.org="">
>> >
>> >
>> > -----------------------------------------------------------------
-------
>> > You rock. That's why Blockbuster's offering you one month of
Blockbuster
>> > Total Access
>> > <http: us.rd.yahoo.com="" evt="47523/*<a href=" http:="" tc.deals.yahoo.com="" tc="" b"="" rel="nofollow">http://tc.deals.yahoo.com/tc/b" lockbuster="" text5.com="">> > >, No Cost.
>
> -- Robert Gentleman, PhD Program in Computational Biology Division
of
> Public Health Sciences Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024
> 206-667-7700 rgentlem at fhcrc.org
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
>
http://mail.yahoo.com
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org