'parallel' vs 'multicore'
1
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.4 years ago
United States
I have a lot of methods in methylumi (the revised version) that will happily parallelize themselves for (e.g.) loading hundreds of IDAT files, background correcting and normalizing anything in sight, etc. Sometimes it's easier to parallelize things until I can find time to make them properly efficient (boooo!). When I compiled HEAD for R-2.14 the other day, after installing it, I typed library(parallel) And all the handy bits of snow and multicore were in there! If I switch to the 'parallel' package, by default, will I now be OK and not screw Windows users? Everything works great on Linux/Unix, and has done so for months, with 'multicore'. It seems like there aren't any substantial differences other than things "just work" for a base installation -- do other package authors anticipate moving over now that this is slated to be in the stable release? -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
methylumi methylumi • 1.6k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote: > I have a lot of methods in methylumi (the revised version) that will happily > parallelize themselves for (e.g.) loading hundreds of IDAT files, background > correcting and normalizing anything in sight, etc. Sometimes it's easier to > parallelize things until I can find time to make them properly efficient > (boooo!). > When I compiled HEAD for R-2.14 the other day, after installing it, I typed > > library(parallel) > > And all the handy bits of snow and multicore were in there! If I switch to > the 'parallel' package, by default, will I now be OK and not screw Windows > users? Everything works great on Linux/Unix, and has done so for months, > with 'multicore'. It seems like there aren't any substantial differences > other than things "just work" for a base installation -- do other package > authors anticipate moving over now that this is slated to be in the stable > release? Yes you and other developers should switch to parallel; it seems to be the wave of the future. Likely your DESCRIPTION file should have Imports: parallel and your NAMESPACE import(parallel) Importing all of parallel seems to be the best solution, because the available symbols depend on platform, e.g., mclapply on Linux / Mac but not Windows. It's still the case that mclapply, for instance, is not supported on Windows so your code needs to have some conditional evaluation -- exists("mclapply", "package:parallel"). If memory weren't an issue, then the 'sockets' interface from SNOW are the most portable. Martin -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Out of curiosity, why would memory be less of an issue with SNOW than with mclapply? My intuition is that, as soon as the data in a child process' image diverges from the parent process', the memory usage will get pretty savage either way. At least, that matches what I remember about how fork() works and what I see when I run diverging children. They misbehave on occasion, as children are wont to do. Anyways -- would it be out of the question for 'parallel' to export a dummy function like mclapply <- lapply on Windows? Maybe I'll go post that on r-dev so that Prof. Ripley can bite my head off :-) For all the shortcomings of foreach() / doMC() and friends, their default (run serially) was/is sensible. On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote: > >> I have a lot of methods in methylumi (the revised version) that will >> happily >> parallelize themselves for (e.g.) loading hundreds of IDAT files, >> background >> correcting and normalizing anything in sight, etc. Sometimes it's easier >> to >> parallelize things until I can find time to make them properly efficient >> (boooo!). >> When I compiled HEAD for R-2.14 the other day, after installing it, I >> typed >> >> library(parallel) >> >> And all the handy bits of snow and multicore were in there! If I switch >> to >> the 'parallel' package, by default, will I now be OK and not screw Windows >> users? Everything works great on Linux/Unix, and has done so for months, >> with 'multicore'. It seems like there aren't any substantial differences >> other than things "just work" for a base installation -- do other package >> authors anticipate moving over now that this is slated to be in the stable >> release? >> > > Yes you and other developers should switch to parallel; it seems to be the > wave of the future. > > Likely your DESCRIPTION file should have > > Imports: parallel > > and your NAMESPACE > > import(parallel) > > Importing all of parallel seems to be the best solution, because the > available symbols depend on platform, e.g., mclapply on Linux / Mac but not > Windows. > > It's still the case that mclapply, for instance, is not supported on > Windows so your code needs to have some conditional evaluation -- > exists("mclapply", "package:parallel"). > > If memory weren't an issue, then the 'sockets' interface from SNOW are the > most portable. > > Martin > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Tim On 10/06/2011 12:26 PM, Tim Triche, Jr. wrote: > Out of curiosity, why would memory be less of an issue with SNOW than > with mclapply? meant to leave the other impression -- that mclapply will generally be better with memory than snow. > My intuition is that, as soon as the data in a child process' image > diverges from the parent process', the memory usage will get pretty > savage either way. At least, that matches what I remember about how > fork() works and what I see when I run diverging children. They > misbehave on occasion, as children are wont to do. In principle and as I understand it fork should be copy-on-change. Objects shared between processes are not duplicated in memory until modified, so any data that is effectively read-only is handled better by multicore. Also, snow will serialize / unserialize objects to send them to children, and this can be quite slow for large objects; both snow and multicore rely on serialization for return values, which really encourages the idea that the return value is significantly reduced -- a vector of counts of reads overlapping regions of interest, rather than the reads themselves. > Anyways -- would it be out of the question for 'parallel' to export a > dummy function like > > mclapply <- lapply > > on Windows? Maybe I'll go post that on r-dev so that Prof. Ripley can > bite my head off :-) yes that's your best bet! Martin > For all the shortcomings of foreach() / doMC() and friends, their > default (run serially) was/is sensible. > > > > On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote: > > I have a lot of methods in methylumi (the revised version) that > will happily > parallelize themselves for (e.g.) loading hundreds of IDAT > files, background > correcting and normalizing anything in sight, etc. Sometimes > it's easier to > parallelize things until I can find time to make them properly > efficient > (boooo!). > When I compiled HEAD for R-2.14 the other day, after installing > it, I typed > > library(parallel) > > And all the handy bits of snow and multicore were in there! If > I switch to > the 'parallel' package, by default, will I now be OK and not > screw Windows > users? Everything works great on Linux/Unix, and has done so for > months, > with 'multicore'. It seems like there aren't any substantial > differences > other than things "just work" for a base installation -- do > other package > authors anticipate moving over now that this is slated to be in > the stable > release? > > > Yes you and other developers should switch to parallel; it seems to > be the wave of the future. > > Likely your DESCRIPTION file should have > > Imports: parallel > > and your NAMESPACE > > import(parallel) > > Importing all of parallel seems to be the best solution, because the > available symbols depend on platform, e.g., mclapply on Linux / Mac > but not Windows. > > It's still the case that mclapply, for instance, is not supported on > Windows so your code needs to have some conditional evaluation -- > exists("mclapply", "package:parallel"). > > If memory weren't an issue, then the 'sockets' interface from SNOW > are the most portable. > > Martin > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 <tel:206%20667-2793> > > > > > -- > If people do not believe that mathematics is simple, > it is only because they do not realize how complicated life is. > > > John von Neumann > <http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6