I have a lot of methods in methylumi (the revised version) that will
happily
parallelize themselves for (e.g.) loading hundreds of IDAT files,
background
correcting and normalizing anything in sight, etc. Sometimes it's
easier to
parallelize things until I can find time to make them properly
efficient
(boooo!).
When I compiled HEAD for R-2.14 the other day, after installing it, I
typed
library(parallel)
And all the handy bits of snow and multicore were in there! If I
switch to
the 'parallel' package, by default, will I now be OK and not screw
Windows
users? Everything works great on Linux/Unix, and has done so for
months,
with 'multicore'. It seems like there aren't any substantial
differences
other than things "just work" for a base installation -- do other
package
authors anticipate moving over now that this is slated to be in the
stable
release?
--
If people do not believe that mathematics is simple,
it is only because they do not realize how complicated life is.John
von
Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html="">
[[alternative HTML version deleted]]
On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote:
> I have a lot of methods in methylumi (the revised version) that will
happily
> parallelize themselves for (e.g.) loading hundreds of IDAT files,
background
> correcting and normalizing anything in sight, etc. Sometimes it's
easier to
> parallelize things until I can find time to make them properly
efficient
> (boooo!).
> When I compiled HEAD for R-2.14 the other day, after installing it,
I typed
>
> library(parallel)
>
> And all the handy bits of snow and multicore were in there! If I
switch to
> the 'parallel' package, by default, will I now be OK and not screw
Windows
> users? Everything works great on Linux/Unix, and has done so for
months,
> with 'multicore'. It seems like there aren't any substantial
differences
> other than things "just work" for a base installation -- do other
package
> authors anticipate moving over now that this is slated to be in the
stable
> release?
Yes you and other developers should switch to parallel; it seems to be
the wave of the future.
Likely your DESCRIPTION file should have
Imports: parallel
and your NAMESPACE
import(parallel)
Importing all of parallel seems to be the best solution, because the
available symbols depend on platform, e.g., mclapply on Linux / Mac
but
not Windows.
It's still the case that mclapply, for instance, is not supported on
Windows so your code needs to have some conditional evaluation --
exists("mclapply", "package:parallel").
If memory weren't an issue, then the 'sockets' interface from SNOW are
the most portable.
Martin
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
Out of curiosity, why would memory be less of an issue with SNOW than
with
mclapply?
My intuition is that, as soon as the data in a child process' image
diverges
from the parent process', the memory usage will get pretty savage
either
way. At least, that matches what I remember about how fork() works
and what
I see when I run diverging children. They misbehave on occasion, as
children are wont to do.
Anyways -- would it be out of the question for 'parallel' to export a
dummy
function like
mclapply <- lapply
on Windows? Maybe I'll go post that on r-dev so that Prof. Ripley can
bite
my head off :-)
For all the shortcomings of foreach() / doMC() and friends, their
default
(run serially) was/is sensible.
On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan@fhcrc.org>
wrote:
> On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote:
>
>> I have a lot of methods in methylumi (the revised version) that
will
>> happily
>> parallelize themselves for (e.g.) loading hundreds of IDAT files,
>> background
>> correcting and normalizing anything in sight, etc. Sometimes it's
easier
>> to
>> parallelize things until I can find time to make them properly
efficient
>> (boooo!).
>> When I compiled HEAD for R-2.14 the other day, after installing it,
I
>> typed
>>
>> library(parallel)
>>
>> And all the handy bits of snow and multicore were in there! If I
switch
>> to
>> the 'parallel' package, by default, will I now be OK and not screw
Windows
>> users? Everything works great on Linux/Unix, and has done so for
months,
>> with 'multicore'. It seems like there aren't any substantial
differences
>> other than things "just work" for a base installation -- do other
package
>> authors anticipate moving over now that this is slated to be in the
stable
>> release?
>>
>
> Yes you and other developers should switch to parallel; it seems to
be the
> wave of the future.
>
> Likely your DESCRIPTION file should have
>
> Imports: parallel
>
> and your NAMESPACE
>
> import(parallel)
>
> Importing all of parallel seems to be the best solution, because the
> available symbols depend on platform, e.g., mclapply on Linux / Mac
but not
> Windows.
>
> It's still the case that mclapply, for instance, is not supported on
> Windows so your code needs to have some conditional evaluation --
> exists("mclapply", "package:parallel").
>
> If memory weren't an issue, then the 'sockets' interface from SNOW
are the
> most portable.
>
> Martin
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
--
If people do not believe that mathematics is simple,
it is only because they do not realize how complicated life is. John
von
Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html="">
[[alternative HTML version deleted]]
Hi Tim
On 10/06/2011 12:26 PM, Tim Triche, Jr. wrote:
> Out of curiosity, why would memory be less of an issue with SNOW
than
> with mclapply?
meant to leave the other impression -- that mclapply will generally be
better with memory than snow.
> My intuition is that, as soon as the data in a child process' image
> diverges from the parent process', the memory usage will get pretty
> savage either way. At least, that matches what I remember about how
> fork() works and what I see when I run diverging children. They
> misbehave on occasion, as children are wont to do.
In principle and as I understand it fork should be copy-on-change.
Objects shared between processes are not duplicated in memory until
modified, so any data that is effectively read-only is handled better
by
multicore. Also, snow will serialize / unserialize objects to send
them
to children, and this can be quite slow for large objects; both snow
and
multicore rely on serialization for return values, which really
encourages the idea that the return value is significantly reduced --
a
vector of counts of reads overlapping regions of interest, rather than
the reads themselves.
> Anyways -- would it be out of the question for 'parallel' to export
a
> dummy function like
>
> mclapply <- lapply
>
> on Windows? Maybe I'll go post that on r-dev so that Prof. Ripley
can
> bite my head off :-)
yes that's your best bet!
Martin
> For all the shortcomings of foreach() / doMC() and friends, their
> default (run serially) was/is sensible.
>
>
>
> On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote:
>
> On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote:
>
> I have a lot of methods in methylumi (the revised version)
that
> will happily
> parallelize themselves for (e.g.) loading hundreds of IDAT
> files, background
> correcting and normalizing anything in sight, etc.
Sometimes
> it's easier to
> parallelize things until I can find time to make them
properly
> efficient
> (boooo!).
> When I compiled HEAD for R-2.14 the other day, after
installing
> it, I typed
>
> library(parallel)
>
> And all the handy bits of snow and multicore were in there!
If
> I switch to
> the 'parallel' package, by default, will I now be OK and not
> screw Windows
> users? Everything works great on Linux/Unix, and has done so
for
> months,
> with 'multicore'. It seems like there aren't any
substantial
> differences
> other than things "just work" for a base installation -- do
> other package
> authors anticipate moving over now that this is slated to be
in
> the stable
> release?
>
>
> Yes you and other developers should switch to parallel; it seems
to
> be the wave of the future.
>
> Likely your DESCRIPTION file should have
>
> Imports: parallel
>
> and your NAMESPACE
>
> import(parallel)
>
> Importing all of parallel seems to be the best solution, because
the
> available symbols depend on platform, e.g., mclapply on Linux /
Mac
> but not Windows.
>
> It's still the case that mclapply, for instance, is not
supported on
> Windows so your code needs to have some conditional evaluation
--
> exists("mclapply", "package:parallel").
>
> If memory weren't an issue, then the 'sockets' interface from
SNOW
> are the most portable.
>
> Martin
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793 <tel:206%20667-2793>
>
>
>
>
> --
> If people do not believe that mathematics is simple,
> it is only because they do not realize how complicated life is.
>
>
> John von Neumann
> <http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html="">
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793