Many missing GO terms from GOHyperG call using mgu74av2
4
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 10.2 years ago
In the course of using GOHyperG, I checked my results against results from NETAFFX and found large discrepancies in the number of GO terms returned for some probeids. Here is an example code snippet that illustrates my point: library(affy) library(mgu74av2) library(GOstats) library(annaffy) smallList <- c("160102_at") myLL <- unlist(mget(smallList, mgu74av2LOCUSID)) sum(duplicated(myLL)) length(myLLunique<- !duplicated(mget(smallList, mgu74av2LOCUSID))) bphyper <- GOHyperG(myLL[myLLunique], lib="mgu74av2", what="BP") sort(names(bphyper$go2Affy)) [1] "GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" [7] "GO:0043170" "GO:0050875" #From NETAFFX for 160102_at #"GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" #"GO:0043170" "GO:0050875" #"GO:0044238" "GO:0044237" "GO:0044260" "GO:0044260" "GO:0044267 Notice that NETAFFX returns 5 additional GO terms that are missing from the results from the GOHyperG call. Here is my sessionInfo: > sessionInfo() R version 2.1.0, 2005-04-18, i386-pc-mingw32 attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" [6] "grDevices" "utils" "datasets" "base" other attached packages: annaffy KEGG hgu95av2 GOstats multtest genefilter "1.0.18" "1.6.8" "1.8.4" "1.1.2" "1.6.0" "1.6.1" survival xtable RBGL annotate GO XML "2.17" "1.2-5" "1.3.7" "1.5.15" "1.6.8" "0.95-6" graph Ruuid cluster mgu74av2 affy reposTools "1.5.0" "1.5.0" "1.9.8" "1.8.4" "1.6.7" "1.5.19" Biobase "1.5.12" Is this a problem with the mgu74av2 metadata, or with GOHyperG? Thanks much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer
GO Survival hgu95av2 mgu74av2 Ruuid annotate multtest affy graph annaffy RBGL GOstats • 1.7k views
ADD COMMENT
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.2 years ago
>In the course of using GOHyperG, I checked my results against results from NETAFFX and found large discrepancies in the number of GO terms returned for some probeids. Unless we build the annotation packages frequently (every week?), there will be descripancies. You may try to build your own annotation packages to minimize the descripancies. > >Here is an example code snippet that illustrates my point: > >library(affy) >library(mgu74av2) >library(GOstats) >library(annaffy) >smallList <- c("160102_at") >myLL <- unlist(mget(smallList, mgu74av2LOCUSID)) >sum(duplicated(myLL)) >length(myLLunique<- !duplicated(mget(smallList, mgu74av2LOCUSID))) >bphyper <- GOHyperG(myLL[myLLunique], lib="mgu74av2", what="BP") > >sort(names(bphyper$go2Affy)) >[1] "GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" >[7] "GO:0043170" "GO:0050875" > >#From NETAFFX for 160102_at >#"GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" >#"GO:0043170" "GO:0050875" >#"GO:0044238" "GO:0044237" "GO:0044260" "GO:0044260" "GO:0044267 > >Notice that NETAFFX returns 5 additional GO terms that are missing from the results from the GOHyperG call. > >Here is my sessionInfo: > >> sessionInfo() >R version 2.1.0, 2005-04-18, i386-pc-mingw32 > >attached base packages: >[1] "splines" "tools" "methods" "stats" "graphics" >[6] "grDevices" "utils" "datasets" "base" > >other attached packages: > annaffy KEGG hgu95av2 GOstats multtest genefilter > "1.0.18" "1.6.8" "1.8.4" "1.1.2" "1.6.0" "1.6.1" > survival xtable RBGL annotate GO XML > "2.17" "1.2-5" "1.3.7" "1.5.15" "1.6.8" "0.95-6" > graph Ruuid cluster mgu74av2 affy reposTools > "1.5.0" "1.5.0" "1.9.8" "1.8.4" "1.6.7" "1.5.19" > Biobase > "1.5.12" > >Is this a problem with the mgu74av2 metadata, or with GOHyperG? > >Thanks much, >Dick >********************************************************************* ********** >Richard P. Beyer, Ph.D. University of Washington >Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 >http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >http://staff.washington.edu/~dbeyer > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT
0
Entering edit mode
Hi John, Thanks for the good suggestion. Is there a way to tell what the date of the GO information is used in a particular build of an annotation package? I'm asking so I can figure out if doing my own build at any particular time is warranted. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Wed, 29 Jun 2005, John Zhang wrote: > >> In the course of using GOHyperG, I checked my results against results from > NETAFFX and found large discrepancies in the number of GO terms returned for > some probeids. > > Unless we build the annotation packages frequently (every week?), there will be > descripancies. You may try to build your own annotation packages to minimize the > descripancies. > >> >> Here is an example code snippet that illustrates my point: >> >> library(affy) >> library(mgu74av2) >> library(GOstats) >> library(annaffy) >> smallList <- c("160102_at") >> myLL <- unlist(mget(smallList, mgu74av2LOCUSID)) >> sum(duplicated(myLL)) >> length(myLLunique<- !duplicated(mget(smallList, mgu74av2LOCUSID))) >> bphyper <- GOHyperG(myLL[myLLunique], lib="mgu74av2", what="BP") >> >> sort(names(bphyper$go2Affy)) >> [1] "GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" > "GO:0019538" >> [7] "GO:0043170" "GO:0050875" >> >> #From NETAFFX for 160102_at >> #"GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" >> #"GO:0043170" "GO:0050875" >> #"GO:0044238" "GO:0044237" "GO:0044260" "GO:0044260" "GO:0044267 >> >> Notice that NETAFFX returns 5 additional GO terms that are missing from the > results from the GOHyperG call. >> >> Here is my sessionInfo: >> >>> sessionInfo() >> R version 2.1.0, 2005-04-18, i386-pc-mingw32 >> >> attached base packages: >> [1] "splines" "tools" "methods" "stats" "graphics" >> [6] "grDevices" "utils" "datasets" "base" >> >> other attached packages: >> annaffy KEGG hgu95av2 GOstats multtest genefilter >> "1.0.18" "1.6.8" "1.8.4" "1.1.2" "1.6.0" "1.6.1" >> survival xtable RBGL annotate GO XML >> "2.17" "1.2-5" "1.3.7" "1.5.15" "1.6.8" "0.95-6" >> graph Ruuid cluster mgu74av2 affy reposTools >> "1.5.0" "1.5.0" "1.9.8" "1.8.4" "1.6.7" "1.5.19" >> Biobase >> "1.5.12" >> >> Is this a problem with the mgu74av2 metadata, or with GOHyperG? >> >> Thanks much, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >> Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 > >
ADD REPLY
0
Entering edit mode
> Unless we build the annotation packages frequently (every week?), there will be > descripancies. You may try to build your own annotation packages to minimize the > descripancies. Is there a reason why that wouldn't be a good idea? I personally like having the latest annotations. I can see a problem trying to reproduce a given analysis, but old annotations can be kept around for a while. How often are the metadata packages being rebuilt right now? What decides when they're being rebuilt? Francois
ADD REPLY
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.2 years ago
Yes, ?GO or ?hgu95av2GO ... >X-Original-To: jzhang at jimmy.harvard.edu >Delivered-To: jzhang at jimmy.harvard.edu >Date: Wed, 29 Jun 2005 09:14:25 -0700 (PDT) >From: Dick Beyer <dbeyer at="" u.washington.edu=""> >To: John Zhang <jzhang at="" jimmy.harvard.edu=""> >Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> >Subject: Re: [BioC] Many missing GO terms from GOHyperG call using mgu74av2 >MIME-Version: 1.0 >X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on pascal.dfci.harvard.edu >X-Spam-Level: >X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.1 > >Hi John, > >Thanks for the good suggestion. Is there a way to tell what the date of the GO information is used in a particular build of an annotation package? I'm asking so I can figure out if doing my own build at any particular time is warranted. > >Thanks very much, >Dick >********************************************************************* ********** >Richard P. Beyer, Ph.D. University of Washington >Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 >http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >http://staff.washington.edu/~dbeyer >********************************************************************* ********** > >On Wed, 29 Jun 2005, John Zhang wrote: > >> >>> In the course of using GOHyperG, I checked my results against results from >> NETAFFX and found large discrepancies in the number of GO terms returned for >> some probeids. >> >> Unless we build the annotation packages frequently (every week?), there will be >> descripancies. You may try to build your own annotation packages to minimize the >> descripancies. >> >>> >>> Here is an example code snippet that illustrates my point: >>> >>> library(affy) >>> library(mgu74av2) >>> library(GOstats) >>> library(annaffy) >>> smallList <- c("160102_at") >>> myLL <- unlist(mget(smallList, mgu74av2LOCUSID)) >>> sum(duplicated(myLL)) >>> length(myLLunique<- !duplicated(mget(smallList, mgu74av2LOCUSID))) >>> bphyper <- GOHyperG(myLL[myLLunique], lib="mgu74av2", what="BP") >>> >>> sort(names(bphyper$go2Affy)) >>> [1] "GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" >> "GO:0019538" >>> [7] "GO:0043170" "GO:0050875" >>> >>> #From NETAFFX for 160102_at >>> #"GO:0006457" "GO:0007582" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538" >>> #"GO:0043170" "GO:0050875" >>> #"GO:0044238" "GO:0044237" "GO:0044260" "GO:0044260" "GO:0044267 >>> >>> Notice that NETAFFX returns 5 additional GO terms that are missing from the >> results from the GOHyperG call. >>> >>> Here is my sessionInfo: >>> >>>> sessionInfo() >>> R version 2.1.0, 2005-04-18, i386-pc-mingw32 >>> >>> attached base packages: >>> [1] "splines" "tools" "methods" "stats" "graphics" >>> [6] "grDevices" "utils" "datasets" "base" >>> >>> other attached packages: >>> annaffy KEGG hgu95av2 GOstats multtest genefilter >>> "1.0.18" "1.6.8" "1.8.4" "1.1.2" "1.6.0" "1.6.1" >>> survival xtable RBGL annotate GO XML >>> "2.17" "1.2-5" "1.3.7" "1.5.15" "1.6.8" "0.95-6" >>> graph Ruuid cluster mgu74av2 affy reposTools >>> "1.5.0" "1.5.0" "1.9.8" "1.8.4" "1.6.7" "1.5.19" >>> Biobase >>> "1.5.12" >>> >>> Is this a problem with the mgu74av2 metadata, or with GOHyperG? >>> >>> Thanks much, >>> Dick >>> ********************************************************************** ********* >>> Richard P. Beyer, Ph.D. University of Washington >>> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >>> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >>> Seattle, WA 98105-6099 >>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >>> http://staff.washington.edu/~dbeyer >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> >> > Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
Hi Francois, On 29 Jun 2005, fpepin at cs.mcgill.ca wrote: > How often are the metadata packages being rebuilt right now? What > decides when they're being rebuilt? Presently, the metadata packages are rebuilt approximately every six months as part of the six mongth release cycle of BioConductor (and R). > Is there a reason why that wouldn't be a good idea? I personally > like having the latest annotations. I can see a problem trying to > reproduce a given analysis, but old annotations can be kept around > for a while. In the past, the packages have been updated more frequently. Clearly, it is nice to have the most up-to-date annotations available. Unfortunately, the structure of the data sources (the inputs to building the annotation packages) change rapidly. It is often the case that running a new annotation package build is a non-trivial task: we have to figure out how and why an XML DTD changes or a field in a csv file is missing, etc. In the past, the structural changes in the data have led to matching changes in the annotation packages and this leads to incompatibilities with existing, released BioC packages. So it is for the above reasons, that we felt it best to stay with a build of the metadata for an entire release. We *do* plan to update the annotation packages and utilize the devel arm of the data repository to host updated builds in-between releases. But our ability to do so is limited to the person-hours we have available to dedicate to the task. Best Wishes, + seth
ADD COMMENT
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.2 years ago
>> Unless we build the annotation packages frequently (every week?), there will be >> descripancies. You may try to build your own annotation packages to minimize the >> descripancies. > >Is there a reason why that wouldn't be a good idea? I personally like >having the latest annotations. I can see a problem trying to reproduce a >given analysis, but old annotations can be kept around for a while. Certainly a good idea if man power is not limited. > >How often are the metadata packages being rebuilt right now? What >decides when they're being rebuilt? About 3 months. The man pages for the package and individual environments contain the information. > >Francois Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6