Entering edit mode
Dick Beyer
★
1.4k
@dick-beyer-26
Last seen 10.3 years ago
Hi Adrian,
Thanks very much for your reply. Your example for building the topGO
object was very helpful.
Another question: Do you have a favorite way to summarize the topGO
output? What I am trying to do is something like CateGOrizer:
http://www.animalgenome.org/bioinfo/tools/catego/
that uses higher level GO terms to give a summary overview of the
enriched GO terms.
Thanks very much,
Dick
**********************************************************************
*********
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
**********************************************************************
*********
On Wed, 3 Mar 2010, Adrian Alexa wrote:
> Hi Dick,
>
> as Sean already mentioned the org.Mm.egGO2EG contains only the most
> specific GO annotations. topGO doesn't care if the supply the most
> specif gene-to-GO mappings or the complete mappings. You will obtain
> the same result if you use either org.Mm.egGO2EG or
> org.Mm.egGO2ALLEGS. However, do to the redundancies in the
> org.Mm.egGO2ALLEGS mappings I advise in using the most specific
> mappings.
>
> Also, since you are using a Bioconductor annotation package, you
don't
> need to construct the gene2GO list to provide the annotations. There
> is a function, namely "annFUN.org" which is more convenient to use
> when building the "topGOdata" object. In this case the instantiation
> of a topGOdata object will look like:
>
> GOdata <- new("topGOdata",
> ontology = "BP",
> allGenes = geneList,
> nodeSize = 5,
> annot = annFUN.org,
> mapping = "org.Mm.eg.db",
> ID = "entrez")
>
> The "mapping" argument tells which annotation chip to be use and the
> "ID" argument selects one of the gene identifiers to be use.
>
>
> You can also use functions from topGO to access the genes annotated
to
> a GO term of interest.
>
> # all the genes annotated to GO:0030522 -- NOT only the most
specific ones!
> myGenes <- genesInTerm(GOdata, "GO:0030522")
>
> # the number of annotated genes
> no.myGenes <- countGenesInTerm(GOdata, "GO:0030522")
>
>
> Hope this helps. Let me know if you have additional questions.
>
>
> Regards,
> Adrian
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 3, 2010 at 7:32 AM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote:
>> Hi Sean,
>>
>> Thanks very much for looking into this. ?I guess I need to think
about this.
>> ?What is confusing to me is topGO takes a gene2GO list as input (a
list of
>> GO terms for each gene), which I generated from org.Mm.egGO2EG (no
>> GO:0030522, for example). Getting GOIDs out of topGO that are in
>> org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I
should build
>> my gene2GO input list from org.Mm.egGO2ALLEGS rather than
org.Mm.egGO2EG.
>>
>> I also didn't dig far enough when I checked GO:0030522 at
geneontology.org,
>> which showed 34 gene products for Mus musculus. ?However, had I
looked
>> further I would have seen GO:0030522 has no genes of its own.
>>
>> Until recently, I used
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for
>> getting Entrez Gene ID/GOIDs mappings, but switched to the
Bioconductor
>> org.Mm.eg.db way as it is much simplier.
>>
>> Thanks for the good education!
>>
>> Cheers,
>> Dick
>> *******************************************************************
************
>> Richard P. Beyer, Ph.D. University of Washington
>> Tel.:(206) 616 7378 ? ? Env. & Occ. Health Sci. , Box 354695
>> Fax: (206) 685 4696 ? ? 4225 Roosevelt Way NE, # 100
>> ? ? ? ? ? ? ? ? ? ? ? ?Seattle, WA 98105-6099
>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
>> http://staff.washington.edu/~dbeyer
>> *******************************************************************
************
>>
>> On Tue, 2 Mar 2010, Sean Davis wrote:
>>
>>> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at="" u.washington.edu="">
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been running topGO (using mouse Entrez Gene IDs) and found
that some
>>>> GO terms that turn up in the topGO analysis are not in the GO
terms from
>>>> org.Mm.eg.db.
>>>>
>>>> I'd like to give some example code to show how to generate the
problem,
>>>> but my topGO code is a lot of lines. ?The output looks like:
>>>>
>>>> allResults[[1]][[1]][1:2,]
>>>> ? ? ? ? GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Term Annotated
Significant
>>>> Expected classic ? ?elim weight
>>>> 714 GO:0019222 ? ? regulation of metabolic process ? ? ?2498 ? ?
? ? 143
>>>> ? 107.08 0.00010 0.17956 0.9057
>>>> 762 GO:0006807 nitrogen compound metabolic process ? ? ?3413 ? ?
? ? 186
>>>> ? 146.31 0.00011 0.45337 0.9434
>>>>
>>>> So, the topGO output gives a column of GOIDs and such.
>>>>
>>>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094,
>>>> GO:0031497, GO:0046700.
>>>>
>>>> I can't find these in names(Mm.egGO2EG).
>>>>
>>>> library("org.Mm.eg.db")
>>>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
>>>> grep("GO:0030522",names(Mm.egGO2EG))
>>>> integer(0)
>>>>
>>>> Is it possible that topGO depends on GO.db, and I'm using
org.Mm.eg.db?
>>>> ?When I check for GO:0030522 for Mus musculus at
geneontology.org,
>>>> GO:0030522 is valid.
>>>>
>>>> I'm puzzled by the mismatch. ?I want to get the genes for a given
GOID,
>>>> so there is probably a work around. ?If anyone has a suggestion
or idea, I'd
>>>> be very grateful to know what to try.
>>>>
>>>
>>> Hi, Dick.
>>>
>>> The Gene Ontology, as I'm sure everyone knows, is hierarchical.
?The
>>> org.Mm.egGO2EG table stores ONLY the most specific term for each
gene.
>>> However, the org.Mm.egGO2ALLEGS stores the term and all the genes
for
>>> itself AND its children. ?Most of the gene ontology analysis
>>> algorithms use the latter definition; it looks like topGO does
also.
>>> In short, try this:
>>>
>>> get('GO:0030522',org.Mm.egGO2ALLEGS)
>>> ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IGI ? ? ?IMP ? ?
?IMP
>>> "11835" ?"11835" ?"11848" ?"12034" ?"12034" ?"13082" ?"13123"
?"13983"
>>> ? ?IMP ? ? ?ISO ? ? ?IMP ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ?
?ISO
>>> "14228" ?"14599" ?"14602" ?"14815" ?"14815" ?"15502" ?"16000"
?"16000"
>>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IMP ? ?
?IDA
>>> "16601" ?"18667" ?"18854" ?"19213" ?"19378" ?"19378" ?"19411"
?"20181"
>>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ? ?IPI ? ? ?IDA ? ?
?IGI
>>> "20182" ?"20183" ?"20779" ?"21815" ?"21848" ?"22215" ?"24074"
?"27401"
>>> ? ?IMP ? ? ?ISA ? ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA
>>> "56351" ?"56847" ?"59035" ?"67488" "224903" "232174"
>>>
>>> Hope that helps clear things up.
>>>
>>> Sean
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>