Entering edit mode
We often execute nested operations in parallel. For example, first by
sample, then by chromosome. Fixed allocation of resources to each
level
will often result in waste. For example, if one sample finishes
quickly,
its CPUs are not available to help the other samples along. Perhaps
the
most expedient solution is to expand.grid() the hierarchy and create
one
job for every combination, i.e., flatten the hierarchy. A more ideal
solution might be a pool of resources (cores) that are allocated more
fluidly. Is there any sort of pooling system for R? I know that the
parallel package supports the declaration of resources in cluster
objects,
but there is no central manager. This is a general R question, but
it's
worth discussing in the context of how we can make better use of
parallelism in the low-level infrastructure, which would cause these
hierarchies to arise. It's also relevant to the discussion of
specifying
parallelization modes or strategies. Pools themselves could be
hierarchical
and heterogeneous (hosts, cores). Declaring available resources is
fairly
straight-forward. Deciding how to use them is context dependent and
requires user control.
Michael
[[alternative HTML version deleted]]