GO question
1
0
Entering edit mode
@gustavo-fernandez-bayon-5300
Last seen 8.9 years ago
Spain
Hi everybody. A simple question: is there any way I can perform a GO enrichment analysis without using the annotation packages? Problem is, I am trying to perform a series of GO analyses in parallel (with foreach), and I am experiencing problems with every call trying to access the same SqLite database. For now, I have solved it by putting "library(GOstats)" inside the inner foreach, but I was wondering if there is a better way. Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
Annotation GO Annotation GO • 1.6k views
ADD COMMENT
0
Entering edit mode
@cristobal-fresno-rodriguez-3838
Last seen 8.6 years ago
Argentina/Cordoba/Universidad Católica …
HI Gus, The same problem here but using parallel. The problem lies in sqlite threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I know, in windows the default binary comes with Serialized (thread safe) and in unix no, thus you have to compile it from the source. But, if you are using fork to parallelize, as in parallel or multicore library it stills breaks the database conection. I don't know if for foreach works. Maybe you should give it a try. At present, the workaround I am using is to manualy split HyperGTest into two functions: one to access annotation packages and other for the actual hypergeometric test. So, in the code first secuentially access all the annotation package/s to build the GO graphs and then run the tests in parallel. This workaround is pretty much what you have been doing so far. Regards, Cristobal 2012/11/15 Gustavo Fernández Bayón <gbayon@gmail.com> > Hi everybody. > > A simple question: is there any way I can perform a GO enrichment analysis > without using the annotation packages? > > Problem is, I am trying to perform a series of GO analyses in parallel > (with foreach), and I am experiencing problems with every call trying to > access the same SqLite database. For now, I have solved it by putting > "library(GOstats)" inside the inner foreach, but I was wondering if there > is a better way. > > Regards, > Gus > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Cristobal, thank you very much for the answer. I'll write it down in case my current workflow decides to work no more. For now, it does work just by loading the GOstats library inside the scope of the inner foreach. Maybe I have the multithreaded version of SQlite, I don't know. I was wondering why Gostats seems too slow when compared with the DAVID web tool. Is it just a matter of hardware (I do not know what is running at DAVID's backyard), or are there more efficient implementations? Is topGO a more efficient alternative? I currently have more than 100 groups of genes on which I want to do a GO analysis, that is why I am experimenting with parallel computing for it. Thank you again for your answer. Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El jueves 15 de noviembre de 2012 a las 16:55, Cristobal Fresno Rodr?guez escribi?: > HI Gus, > > The same problem here but using parallel. The problem lies in sqlite threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I know, in windows the default binary comes with Serialized (thread safe) and in unix no, thus you have to compile it from the source. But, if you are using fork to parallelize, as in parallel or multicore library it stills breaks the database conection. I don't know if for foreach works. Maybe you should give it a try. > > At present, the workaround I am using is to manualy split HyperGTest into two functions: one to access annotation packages and other for the actual hypergeometric test. So, in the code first secuentially access all the annotation package/s to build the GO graphs and then run the tests in parallel. This workaround is pretty much what you have been doing so far. > > Regards, > > Cristobal > > > 2012/11/15 Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > Hi everybody. > > > > A simple question: is there any way I can perform a GO enrichment analysis without using the annotation packages? > > > > Problem is, I am trying to perform a series of GO analyses in parallel (with foreach), and I am experiencing problems with every call trying to access the same SqLite database. For now, I have solved it by putting "library(GOstats)" inside the inner foreach, but I was wondering if there is a better way. > > > > Regards, > > Gus > > > > --------------------------- > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Gustavo, I can see that you and I are dealing with the same issues for GO analysis. Indeed GOStats and DAVID apply very different algorithms. The first uses conditional hypergeometric test (default option) where one tail p-values are obtained walking the graph in a bottom-up fashion (from leaves to roots), whereas the second uses independent Ease scores (penalized two tails Fisher's exact test). Therefore, DAVID breaks the graph structure and paralellize all node evaluations at once. Moreover, the backyard is also hardware dedicated (tunned) for these kind of analysis, while GOstats do not. However, you can use RDAVID or some of the other available APIs from the web site http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html with limitations. Regards, Cristobal 2012/11/16 Gustavo Fernández Bayón <gbayon@gmail.com> > Hi Cristobal, > > thank you very much for the answer. I'll write it down in case my current > workflow decides to work no more. For now, it does work just by loading the > GOstats library inside the scope of the inner foreach. Maybe I have the > multithreaded version of SQlite, I don't know. > > I was wondering why Gostats seems too slow when compared with the DAVID > web tool. Is it just a matter of hardware (I do not know what is running at > DAVID's backyard), or are there more efficient implementations? Is topGO a > more efficient alternative? I currently have more than 100 groups of genes > on which I want to do a GO analysis, that is why I am experimenting with > parallel computing for it. > > Thank you again for your answer. > > Regards, > Gus > > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > El jueves 15 de noviembre de 2012 a las 16:55, Cristobal Fresno Rodríguez > escribió: > > > HI Gus, > > > > The same problem here but using parallel. The problem lies in sqlite > threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I > know, in windows the default binary comes with Serialized (thread safe) and > in unix no, thus you have to compile it from the source. But, if you are > using fork to parallelize, as in parallel or multicore library it stills > breaks the database conection. I don't know if for foreach works. Maybe you > should give it a try. > > > > At present, the workaround I am using is to manualy split HyperGTest > into two functions: one to access annotation packages and other for the > actual hypergeometric test. So, in the code first secuentially access all > the annotation package/s to build the GO graphs and then run the tests in > parallel. This workaround is pretty much what you have been doing so far. > > > > Regards, > > > > Cristobal > > > > > > 2012/11/15 Gustavo Fernández Bayón <gbayon@gmail.com (mailto:=""> gbayon@gmail.com)> > > > Hi everybody. > > > > > > A simple question: is there any way I can perform a GO enrichment > analysis without using the annotation packages? > > > > > > Problem is, I am trying to perform a series of GO analyses in parallel > (with foreach), and I am experiencing problems with every call trying to > access the same SqLite database. For now, I have solved it by putting > "library(GOstats)" inside the inner foreach, but I was wondering if there > is a better way. > > > > > > Regards, > > > Gus > > > > > > --------------------------- > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org mailto:Bioconductor@r-project.org) > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Cristobal. That makes sense to me now. Thank you for the explanation. For now, I would like to avoid using RDAVID, and stick with the BioC, but it's good to know about it. Thanks again. Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El viernes 16 de noviembre de 2012 a las 14:57, Cristobal Fresno Rodr?guez escribi?: > Hi Gustavo, > > I can see that you and I are dealing with the same issues for GO analysis. Indeed GOStats and DAVID apply very different algorithms. The first uses conditional hypergeometric test (default option) where one tail p-values are obtained walking the graph in a bottom-up fashion (from leaves to roots), whereas the second uses independent Ease scores (penalized two tails Fisher's exact test). Therefore, DAVID breaks the graph structure and paralellize all node evaluations at once. Moreover, the backyard is also hardware dedicated (tunned) for these kind of analysis, while GOstats do not. However, you can use RDAVID or some of the other available APIs from the web site http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html with limitations. > > Regards, > > Cristobal > > > 2012/11/16 Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > Hi Cristobal, > > > > thank you very much for the answer. I'll write it down in case my current workflow decides to work no more. For now, it does work just by loading the GOstats library inside the scope of the inner foreach. Maybe I have the multithreaded version of SQlite, I don't know. > > > > I was wondering why Gostats seems too slow when compared with the DAVID web tool. Is it just a matter of hardware (I do not know what is running at DAVID's backyard), or are there more efficient implementations? Is topGO a more efficient alternative? I currently have more than 100 groups of genes on which I want to do a GO analysis, that is why I am experimenting with parallel computing for it. > > > > Thank you again for your answer. > > > > Regards, > > Gus > > > > > > --------------------------- > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > El jueves 15 de noviembre de 2012 a las 16:55, Cristobal Fresno Rodr?guez escribi?: > > > > > HI Gus, > > > > > > The same problem here but using parallel. The problem lies in sqlite threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I know, in windows the default binary comes with Serialized (thread safe) and in unix no, thus you have to compile it from the source. But, if you are using fork to parallelize, as in parallel or multicore library it stills breaks the database conection. I don't know if for foreach works. Maybe you should give it a try. > > > > > > At present, the workaround I am using is to manualy split HyperGTest into two functions: one to access annotation packages and other for the actual hypergeometric test. So, in the code first secuentially access all the annotation package/s to build the GO graphs and then run the tests in parallel. This workaround is pretty much what you have been doing so far. > > > > > > Regards, > > > > > > Cristobal > > > > > > > > > 2012/11/15 Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)="" (mailto:gbayon="" at="" gmail.com)=""> > > > > Hi everybody. > > > > > > > > A simple question: is there any way I can perform a GO enrichment analysis without using the annotation packages? > > > > > > > > Problem is, I am trying to perform a series of GO analyses in parallel (with foreach), and I am experiencing problems with every call trying to access the same SqLite database. For now, I have solved it by putting "library(GOstats)" inside the inner foreach, but I was wondering if there is a better way. > > > > > > > > Regards, > > > > Gus > > > > > > > > --------------------------- > > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) (mailto:Bioconductor at r-project.org) > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >
ADD REPLY
0
Entering edit mode
I'll chime in with my two cents... I am unfortunately not familiar with GOstats, but another option for calculating GO enrichments is function GOenrichmentAnalysis in the WGCNA package (which I maintain and which lives on CRAN). The advantage of GOenrichmentAnalysis is that it can take multiple sets of labels (gene sets), creates the GO gene lists once, then calculates enrichments of all given gene lists. Indeed, creating the GO lists is the most time consuming part. HTH, Peter On Mon, Nov 19, 2012 at 2:09 AM, Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com=""> wrote: > Hi Cristobal. > > That makes sense to me now. Thank you for the explanation. For now, I would like to avoid using RDAVID, and stick with the BioC, but it's good to know about it. > > Thanks again. > > Regards, > Gus > > > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > El viernes 16 de noviembre de 2012 a las 14:57, Cristobal Fresno Rodr?guez escribi?: > >> Hi Gustavo, >> >> I can see that you and I are dealing with the same issues for GO analysis. Indeed GOStats and DAVID apply very different algorithms. The first uses conditional hypergeometric test (default option) where one tail p-values are obtained walking the graph in a bottom-up fashion (from leaves to roots), whereas the second uses independent Ease scores (penalized two tails Fisher's exact test). Therefore, DAVID breaks the graph structure and paralellize all node evaluations at once. Moreover, the backyard is also hardware dedicated (tunned) for these kind of analysis, while GOstats do not. However, you can use RDAVID or some of the other available APIs from the web site http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html with limitations. >> >> Regards, >> >> Cristobal >> >> >> 2012/11/16 Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> >> > Hi Cristobal, >> > >> > thank you very much for the answer. I'll write it down in case my current workflow decides to work no more. For now, it does work just by loading the GOstats library inside the scope of the inner foreach. Maybe I have the multithreaded version of SQlite, I don't know. >> > >> > I was wondering why Gostats seems too slow when compared with the DAVID web tool. Is it just a matter of hardware (I do not know what is running at DAVID's backyard), or are there more efficient implementations? Is topGO a more efficient alternative? I currently have more than 100 groups of genes on which I want to do a GO analysis, that is why I am experimenting with parallel computing for it. >> > >> > Thank you again for your answer. >> > >> > Regards, >> > Gus >> > >> > >> > --------------------------- >> > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) >> > >> > >> > El jueves 15 de noviembre de 2012 a las 16:55, Cristobal Fresno Rodr?guez escribi?: >> > >> > > HI Gus, >> > > >> > > The same problem here but using parallel. The problem lies in sqlite threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I know, in windows the default binary comes with Serialized (thread safe) and in unix no, thus you have to compile it from the source. But, if you are using fork to parallelize, as in parallel or multicore library it stills breaks the database conection. I don't know if for foreach works. Maybe you should give it a try. >> > > >> > > At present, the workaround I am using is to manualy split HyperGTest into two functions: one to access annotation packages and other for the actual hypergeometric test. So, in the code first secuentially access all the annotation package/s to build the GO graphs and then run the tests in parallel. This workaround is pretty much what you have been doing so far. >> > > >> > > Regards, >> > > >> > > Cristobal >> > > >> > > >> > > 2012/11/15 Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)="" (mailto:gbayon="" at="" gmail.com)=""> >> > > > Hi everybody. >> > > > >> > > > A simple question: is there any way I can perform a GO enrichment analysis without using the annotation packages? >> > > > >> > > > Problem is, I am trying to perform a series of GO analyses in parallel (with foreach), and I am experiencing problems with every call trying to access the same SqLite database. For now, I have solved it by putting "library(GOstats)" inside the inner foreach, but I was wondering if there is a better way. >> > > > >> > > > Regards, >> > > > Gus >> > > > >> > > > --------------------------- >> > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) >> > > > >> > > > _______________________________________________ >> > > > Bioconductor mailing list >> > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) (mailto:Bioconductor at r-project.org) >> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >> > >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6