Hi Johannes,
You are right that the current Category/GOstats implementations rely
on Bioconductor annotation data packages being available. Taking the
time to generate an annotation data package using AnnBuilder would
have other benefits aside from being able to use the GOstats code, but
I can sympathize with wanting a way to use these tools without going
through that step first.
I'm not opposed to the idea of finding a way to let the GOstats tools
operate without an annotation data package, but at present won't have
time to implement anything (what is there now suits our needs fairly
well). So patches are welcome. :-)
"Johannes Rainer" <johannes.rainer at="" tcri.at=""> writes:
> thanks for your suggestion, this would be a solution,
> but as far as i understand the functions from the GOstats and
Category
> packages map each time the hyperGTest function is called the
submitted ids
> to GO terms using the annotation packages (i.e. hgu133plus2
annotation
> packages). actually the mapping is performed in the getGoToEntrezMap
> function (Category package), and this function maps EntrezGene IDs
to GO
> terms by first mapping affy IDs to GO terms and then affy IDs to
EntrezGene
> IDs.
Yes, the mapping is recomputed for each call and this could probably
be improved. Indeed, as we transition to SQLite-based annotation data
packages, many of the contortions of the current code can be avoided
entirely. I'm not sure we can avoid computing the mapping for each
call because we need to filter the mapping based on the provided list
of gene IDs.
> when i submit the EntrezGene IDs of the selected genes and those of
the gene
> universe, i would not need the information from the annotation
packages that
> map affy ids to entrezgene ids and affy ids to GO terms. the mapping
between
> GO terms and EntrezGene IDs can be performed using the GO package
> i.e.
>
> GOLL <- as.list(get("GOALLENTREZID",mode="environment"))
> GOLL <- GOLL[!is.na(GOLL)] # just removing all the GO ids that
are not
> mapped to any EntrezGene ID
> PresentGO <- sapply(GOLL,function(z){
> ifis.na(z) || length(z)==0)
> return(FALSE)
> any(x %in% z) # x are EntrezGene IDs, either from
the
> gene universe or the selected ones
> }
> )
>
> GOLL <- GOLL[PresentGO]
>
> GOLL is than a list of all GO terms for the EntrezGene IDs specified
with x
> (containing all ontologies, MF, CC and BP)
Aside:
The GOALLENTREZID map should probably be replaced with organism
and ontology specific maps. The current map is huge and if we were
to use it as you are suggesting, I suspect it would be even slower
than the current map genertion to go through and selected the
desired ontology, eliminate GO IDs with no annotations in the
selected gene list, etc.
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
http://bioconductor.org
Hello Johannes,
you can use BioConductor's "AnnBuilder" package to produce a custom
annotation package for your own selection of EntrezGeneIDs and then
use
GOstats with this custom annotation package. This may be more
efficient
than mapping EntrezGeneIDs to GO nodes each time you run GOstats.
Having
said that, I have to admit that it took some time to get AnnBuilder
running due to its dependencies.
Best regards,
Joern
Johannes Rainer wrote:
> dear Seth, dear Bioconductor members,
>
> as far as i understand you are using the annotation package defined
with the
> "annotation" parameter (GOHyperGParams) to map the submitted
EntrezGeneIDs
> to the GO terms. this works fine for Affymetrix arrays with
available
> annotation packages, but we are for example also using Exon arrays
and are
> annotating the probes on our own. my suggestion is to support also
the
> mapping from EntrezGene IDs to GO terms using the GO package. this
would
> allow GO analyses for all microarray platforms, not just Affmetrix
arrays
> with available annotation packages.
>
>
> sincerely, jo
>
>
>
Hi Johannes,
I forgot to mention the possibility of using the hummanLLMapping
package as an annotation source. Does this do what you want?
It is simply EntrezGene ID based, but organism specific.
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
http://bioconductor.org
Hi Jo,
I looked at all the responses to your GOstats question and I'm
wondering why no one is mentioning using the topGO package. It seems
to do what you want, that is, you define your universe however you
want. You don't have to use affy annotation.
Cheers,
Dick
**********************************************************************
*********
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.htmlhttp://staff.washington.edu/~dbeyer
**********************************************************************
*********
------------------------------
Message: 6
Date: Mon, 4 Jun 2007 16:56:42 +0200
From: "Johannes Rainer" <johannes.rainer@tcri.at>
Subject: Re: [BioC] GOstats suggestion
To: "Joern Toedling" <toedling at="" ebi.ac.uk="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<ae8a85340706040756w2c55651i23280947ac908902 at="" mail.gmail.com="">
Content-Type: text/plain
thanks for your suggestion, this would be a solution,
but as far as i understand the functions from the GOstats and Category
packages map each time the hyperGTest function is called the submitted
ids
to GO terms using the annotation packages (i.e. hgu133plus2 annotation
packages). actually the mapping is performed in the getGoToEntrezMap
function (Category package), and this function maps EntrezGene IDs to
GO
terms by first mapping affy IDs to GO terms and then affy IDs to
EntrezGene
IDs.
when i submit the EntrezGene IDs of the selected genes and those of
the gene
universe, i would not need the information from the annotation
packages that
map affy ids to entrezgene ids and affy ids to GO terms. the mapping
between
GO terms and EntrezGene IDs can be performed using the GO package
i.e.
GOLL <- as.list(get("GOALLENTREZID",mode="environment"))
GOLL <- GOLL[!is.na(GOLL)] # just removing all the GO ids that
are not
mapped to any EntrezGene ID
PresentGO <- sapply(GOLL,function(z){
ifis.na(z) || length(z)==0)
return(FALSE)
any(x %in% z) # x are EntrezGene IDs, either from
the
gene universe or the selected ones
}
)
GOLL <- GOLL[PresentGO]
GOLL is than a list of all GO terms for the EntrezGene IDs specified
with x
(containing all ontologies, MF, CC and BP)
i think using the GO/EntrezGene mapping from GO package would not
restric
the GO analysis to platforms/micro arrays where annotation packages
exist...
sincerely, jo
On 6/4/07, Joern Toedling <toedling at="" ebi.ac.uk=""> wrote:
>
> Hello Johannes,
>
> you can use BioConductor's "AnnBuilder" package to produce a custom
> annotation package for your own selection of EntrezGeneIDs and then
use
> GOstats with this custom annotation package. This may be more
efficient
> than mapping EntrezGeneIDs to GO nodes each time you run GOstats.
Having
> said that, I have to admit that it took some time to get AnnBuilder
> running due to its dependencies.
>
> Best regards,
> Joern
>
> Johannes Rainer wrote:
> > dear Seth, dear Bioconductor members,
> >
> > as far as i understand you are using the annotation package
defined with
> the
> > "annotation" parameter (GOHyperGParams) to map the submitted
> EntrezGeneIDs
> > to the GO terms. this works fine for Affymetrix arrays with
available
> > annotation packages, but we are for example also using Exon arrays
and
> are
> > annotating the probes on our own. my suggestion is to support also
the
> > mapping from EntrezGene IDs to GO terms using the GO package. this
would
> > allow GO analyses for all microarray platforms, not just Affmetrix
> arrays
> > with available annotation packages.
> >
> >
> > sincerely, jo
> >
> >
> >
>
>
--
Johannes Rainer, Msc
Tyrolean Cancer Research Institute
Innrain 66, 6020 Innsbruck, Austria
Tel.: +43 512 570485 33
Email: johannes.rainer at tcri.at
johannes.rainer at tugraz.at
[[alternative HTML version deleted]]