multicore and GRangesList [Resurrected]
1
0
Entering edit mode
Malcolm Cook ★ 1.6k
@malcolm-cook-6293
Last seen 5 months ago
United States
The question of approaches to parallelizing operations on a GRangesList was raised in this thread: http://thread.gmane.org/gmane. science.biology.informatics.conductor/32799 I find the issue still relevant when using the new `parallel` package. I have adopted the following practice, for which I seek your criticism or accolades. Your choice. The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of... pvec_along <-function(x,FUN,...) { ### PURPOSE: extension to parallel::pvec for non-vectors which is ### vectorized over the indices of x. ### ### Example: pvec_along(myGRangesList,width) ### ### Requires: `library(functional)` `library(parallel)` indices<-seq_along(x) FUN<-match.fun(FUN) pvec(indices,Compose(Curry(`[`,x),FUN),...) } Discuss? Best, ~ Malcolm Cook
• 858 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 13 hours ago
United States
On 09/19/2012 09:30 AM, Cook, Malcolm wrote: > The question of approaches to parallelizing operations on a GRangesList was raised in this thread: http://thread.gmane.org/gmane. science.biology.informatics.conductor/32799 > > I find the issue still relevant when using the new `parallel` package. > > I have adopted the following practice, for which I seek your criticism or accolades. Your choice. > > The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of... > > pvec_along <-function(x,FUN,...) { > ### PURPOSE: extension to parallel::pvec for non-vectors which is > ### vectorized over the indices of x. > ### > ### Example: pvec_along(myGRangesList,width) > ### > ### Requires: `library(functional)` `library(parallel)` > indices<-seq_along(x) > FUN<-match.fun(FUN) > pvec(indices,Compose(Curry(`[`,x),FUN),...) > } > > Discuss? pvec seems conceptually relevant; the benefits of the functional stuff not immediately clear. Explain. > > Best, > > ~ Malcolm Cook > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, The benefits of the functional stuff are purely stylistic. And NOT (I have just learned) performance! Indeed, after running some timing tests, I have rewritten pvec_along without using Compose & Curry, as: pvec_along <-function(x,FUN,...) { ### PURPOSE: extension to parallel::pvec for non-vectors which is ### vectorized over the indices of x. ### ### Example: pvec_along(myGRangesList,width) ### this is functionally equivalent to: ### pvec(seq_along(myGRangesList),function(i) width(myGRangesList[i])) ### ### Requires: `library(parallel)` indices<-seq_along(x) FUN<-match.fun(FUN) ## FYI: repeated system.times using 11 cores showed 13% worse ## performance using `library(functional)` approach written as: ## pvec(indices,Compose(Curry(`[`,x),FUN),...) pvec(indices,function(indices) FUN(x[indices]),...) } Better? So, my stylistic preferences are admonished. I have been increasingly developing idiomatic use of Compose and Curry. Perhaps I must stop. Or learn if possible to avoid the overhead they impose. Regardless.... In any case, pvec_along is just a simple convenience wrapper to something that could be directly written. But I find it a very useful abstraction. Do you see better ways of expressing this idiom? It is arguable that mclapply (and pvec) should 'just work' over GRangesList. After all, lapply does. But, to remind us: > parallel::mclapply(myGRangesList,width) Error in as.list.default(X) : no method for coercing this S4 class to a vector and, of course, pvec only works with vectors: > pvec(myGRangesList,width) Error in pvec(myGRangesList, width) : 'v' must be a vector Do you think mclapply/pvec should work with Lists? FWIW: one aspect of pvec that I think could be improved is how the results from each core are combined, which is hard-wired to `c` where it could be made an optional parameter (i.e. `GRangesList`). In the mean time, FWIW, I have written a similar wrapper to mclapply named mclapply_alongRanges. ~Malcolm > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Sent: Thursday, September 20, 2012 8:11 AM > To: Cook, Malcolm > Cc: 'Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch)'; 'arne.mueller at novartis.com'; 'stefano.calza at med.unibs.it'; > 'barr.cory at gene.com'; 'Steve Lianoglou (mailinglist.honeypot at gmail.com)'; 'Michael Lawrence <lawrence.michael at="" gene.com=""> > (lawrence.michael at gene.com)'; Blanchette, Marco > Subject: Re: multicore and GRangesList [Resurrected] > > On 09/19/2012 09:30 AM, Cook, Malcolm wrote: > > The question of approaches to parallelizing operations on a GRangesList was raised in this thread: > http://thread.gmane.org/gmane.science.biology.informatics.conductor/ 32799 > > > > I find the issue still relevant when using the new `parallel` package. > > > > I have adopted the following practice, for which I seek your criticism or accolades. Your choice. > > > > The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of... > > > > pvec_along <-function(x,FUN,...) { > > ### PURPOSE: extension to parallel::pvec for non-vectors which is > > ### vectorized over the indices of x. > > ### > > ### Example: pvec_along(myGRangesList,width) > > ### > > ### Requires: `library(functional)` `library(parallel)` > > indices<-seq_along(x) > > FUN<-match.fun(FUN) > > pvec(indices,Compose(Curry(`[`,x),FUN),...) > > } > > > > Discuss? > > pvec seems conceptually relevant; the benefits of the functional stuff > not immediately clear. Explain. > > > > > Best, > > > > ~ Malcolm Cook > > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Why not make pvec a generic? On Thu, Sep 20, 2012 at 10:06 AM, Cook, Malcolm <mec@stowers.org> wrote: > Hi Martin, > > The benefits of the functional stuff are purely stylistic. > > And NOT (I have just learned) performance! > > Indeed, after running some timing tests, I have rewritten pvec_along > without using Compose & Curry, as: > > pvec_along <-function(x,FUN,...) { > ### PURPOSE: extension to parallel::pvec for non-vectors which is > ### vectorized over the indices of x. > ### > ### Example: pvec_along(myGRangesList,width) > ### this is functionally equivalent to: > ### pvec(seq_along(myGRangesList),function(i) > width(myGRangesList[i])) > ### > ### Requires: `library(parallel)` > indices<-seq_along(x) > FUN<-match.fun(FUN) > ## FYI: repeated system.times using 11 cores showed 13% worse > ## performance using `library(functional)` approach written as: > ## pvec(indices,Compose(Curry(`[`,x),FUN),...) > pvec(indices,function(indices) FUN(x[indices]),...) > } > > Better? > > So, my stylistic preferences are admonished. I have been increasingly > developing idiomatic use of Compose and Curry. Perhaps I must stop. Or > learn if possible to avoid the overhead they impose. > > Regardless.... > > In any case, pvec_along is just a simple convenience wrapper to something > that could be directly written. But I find it a very useful abstraction. > > Do you see better ways of expressing this idiom? > > It is arguable that mclapply (and pvec) should 'just work' over > GRangesList. After all, lapply does. > > But, to remind us: > > > parallel::mclapply(myGRangesList,width) > Error in as.list.default(X) : > no method for coercing this S4 class to a vector > > and, of course, pvec only works with vectors: > > > pvec(myGRangesList,width) > Error in pvec(myGRangesList, width) : 'v' must be a vector > > Do you think mclapply/pvec should work with Lists? > > FWIW: one aspect of pvec that I think could be improved is how the results > from each core are combined, which is hard-wired to `c` where it could be > made an optional parameter (i.e. `GRangesList`). > > In the mean time, FWIW, I have written a similar wrapper to mclapply named > mclapply_alongRanges. > > ~Malcolm > > > > -----Original Message----- > > From: Martin Morgan [mailto:mtmorgan@fhcrc.org] > > Sent: Thursday, September 20, 2012 8:11 AM > > To: Cook, Malcolm > > Cc: 'Bioconductor Newsgroup (bioconductor@stat.math.ethz.ch)'; ' > arne.mueller@novartis.com'; 'stefano.calza@med.unibs.it'; > > 'barr.cory@gene.com'; 'Steve Lianoglou (mailinglist.honeypot@gmail.com)'; > 'Michael Lawrence <lawrence.michael@gene.com> > > (lawrence.michael@gene.com)'; Blanchette, Marco > > Subject: Re: multicore and GRangesList [Resurrected] > > > > On 09/19/2012 09:30 AM, Cook, Malcolm wrote: > > > The question of approaches to parallelizing operations on a > GRangesList was raised in this thread: > > > http://thread.gmane.org/gmane.science.biology.informatics.conductor/ 32799 > > > > > > I find the issue still relevant when using the new `parallel` package. > > > > > > I have adopted the following practice, for which I seek your criticism > or accolades. Your choice. > > > > > > The approach is to use parallel::pvec over the indices of the > GRangesList, with a little sugar in the form of... > > > > > > pvec_along <-function(x,FUN,...) { > > > ### PURPOSE: extension to parallel::pvec for non-vectors which is > > > ### vectorized over the indices of x. > > > ### > > > ### Example: pvec_along(myGRangesList,width) > > > ### > > > ### Requires: `library(functional)` `library(parallel)` > > > indices<-seq_along(x) > > > FUN<-match.fun(FUN) > > > pvec(indices,Compose(Curry(`[`,x),FUN),...) > > > } > > > > > > Discuss? > > > > pvec seems conceptually relevant; the benefits of the functional stuff > > not immediately clear. Explain. > > > > > > > > Best, > > > > > > ~ Malcolm Cook > > > > > > > > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Tim, My understanding of the term "generic" is from CLOS (Common Lisp Object System)... and I've not yet forayed much into OO R other than as a user.... but If I understand how the term "generic" is being used here... I don't think that pvec itself need be a generic. Rather it just needs to be written in terms of generics. I may be missing the point. If its arg, v, implements '[' and 'length' and `c`, then it _should_ work. But as written it does not. Here is a (better?) version that does: Comments? Improvements? Is it better? Thx, ~Malcolm From: Tim Triche, Jr. [mailto:tim.triche@gmail.com] Sent: Thursday, September 20, 2012 12:16 PM To: Cook, Malcolm Cc: Martin Morgan; Michael Lawrence <lawrence.michael at="" gene.com=""> (lawrence.michael at gene.com); stefano.calza at med.unibs.it; Blanchette, Marco; Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch) Subject: Re: [BioC] multicore and GRangesList [Resurrected] Why not make pvec a generic? On Thu, Sep 20, 2012 at 10:06 AM, Cook, Malcolm <mec at="" stowers.org=""> wrote: Hi Martin, The benefits of the functional stuff are purely stylistic. And NOT (I have just learned) performance! Indeed, after running some timing tests, I have rewritten pvec_along without using Compose & Curry, as: pvec_along <-function(x,FUN,...) { ### PURPOSE: extension to parallel::pvec for non-vectors which is ### vectorized over the indices of x. ### ### Example: pvec_along(myGRangesList,width) ### ? ? ? ? ?this is functionally equivalent to: ### ? ? ? ? ?pvec(seq_along(myGRangesList),function(i) width(myGRangesList[i])) ### ### Requires: `library(parallel)` ? indices<-seq_along(x) ? FUN<-match.fun(FUN) ? ## FYI: repeated system.times using 11 cores showed 13% worse ? ## performance using `library(functional)` approach written as: ? ## pvec(indices,Compose(Curry(`[`,x),FUN),...) ? pvec(indices,function(indices) FUN(x[indices]),...) } Better? So, my stylistic preferences are admonished. ?I have been increasingly developing idiomatic use of Compose and Curry. ?Perhaps I must stop. ?Or learn if possible to avoid the overhead they impose. Regardless.... In any case, pvec_along is just a simple convenience wrapper to something that could be directly written. ?But I find it a very useful abstraction. Do you see better ways of expressing this idiom? It is arguable that mclapply (and pvec) should 'just work' over GRangesList. ?After all, lapply does. But, to remind us: > parallel::mclapply(myGRangesList,width) Error in as.list.default(X) : ? no method for coercing this S4 class to a vector and, of course, pvec only works with vectors: > pvec(myGRangesList,width) Error in pvec(myGRangesList, width) : 'v' must be a vector Do you think mclapply/pvec should work with Lists? FWIW: one aspect of pvec that I think could be improved is how the results from each core are combined, which is hard-wired to `c` where it could be made an optional parameter (i.e. `GRangesList`). In the mean time, FWIW, I have written a similar wrapper to mclapply named mclapply_alongRanges. ~Malcolm > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Sent: Thursday, September 20, 2012 8:11 AM > To: Cook, Malcolm > Cc: 'Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch)'; 'arne.mueller at novartis.com'; 'stefano.calza at med.unibs.it'; > 'barr.cory at gene.com'; 'Steve Lianoglou (mailinglist.honeypot at gmail.com)'; 'Michael Lawrence <lawrence.michael at="" gene.com=""> > (lawrence.michael at gene.com)'; Blanchette, Marco > Subject: Re: multicore and GRangesList [Resurrected] > > On 09/19/2012 09:30 AM, Cook, Malcolm wrote: > > The question of approaches to parallelizing operations on a GRangesList was raised in this thread: > http://thread.gmane.org/gmane.science.biology.informatics.conductor/ 32799 > > > > I find the issue still relevant when using the new `parallel` package. > > > > I have adopted the following practice, for which I seek your criticism or accolades. ?Your choice. > > > > The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of... > > > > pvec_along <-function(x,FUN,...) { > > ### PURPOSE: extension to parallel::pvec for non-vectors which is > > ### vectorized over the indices of x. > > ### > > ### Example: pvec_along(myGRangesList,width) > > ### > > ### Requires: `library(functional)` `library(parallel)` > > ? ?indices<-seq_along(x) > > ? ?FUN<-match.fun(FUN) > > ? ?pvec(indices,Compose(Curry(`[`,x),FUN),...) > > } > > > > Discuss? > > pvec seems conceptually relevant; the benefits of the functional stuff > not immediately clear. Explain. > > > > > Best, > > > > ~ Malcolm Cook > > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper
ADD REPLY

Login before adding your answer.

Traffic: 618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6