BiocParallel and variable scope in child processes
1
0
Entering edit mode
dan.gatti • 0
@dangatti-7637
Last seen 8.0 years ago

I would like to call a function that parallelizes the computation and I'd like to use FORK clusters so that I don't have to export the variables I'm operating on to each cluster.  I've made a minimal toy example. The following code works fine. All of the clusters can see the variable 'a':

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
a = vector("list", 20)
for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
param = MulticoreParam(workers = 4, type = "FORK")
bplapply(1:4, fxn, BPPARAM = param)
rm(list = ls())

However, in my pipeline, I need to be able to call a function, create some variables and then parallelize the work.  To my surprise, the forked clusters can't see the variable 'a', which is created right above the call to makeCluster.  Does anyone have any insight on why the forked clusters can't see 'a'? And what can I do to make variables visible to the forked clusters?

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
parfxn = function() {
  a = vector("list", 20)
  for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
  param = MulticoreParam(workers = 4, type = "FORK")
  bplapply(1:4, fxn, BPPARAM = param)
}
parfxn()

The error is:

Error: BiocParallel errors
  element index: 1, 2, 3, 4
  first error: object 'a' not found

Thanks in advance.

parallel variable scope fork BiocParallel • 1.4k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

The approach fails without parallel evaluation too

> fxn = function(x) mean(a[[1]])
> parfxn = function()  { a = list(1:10); fxn() }
> parfxn()
Error in mean(a[[1]]) : object 'a' not found

because fxn is trying to find variables in the environment in which it was defined, rather than the environment it was called. The better practice is to write functions that do not refer to variables outside their scope, a practice which is required anyway for Windows or cluster users

fxn = function(x, a) mean(a[[x]])
parfxn = function() {
    ...
    bplapply(1:4, fxn, a, BPPARAM=param)
}

One could also define fxn inside parfxn, but that makes reuse difficult.

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6