BiocParallel NULL value passed as symbol address
1
0
Entering edit mode
@ioannisvardaxis-11763
Last seen 18 months ago
Norway/Oslo

Hey,

I have written a simple function Rcpp package, the function is the following:

#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
SEXP Test(double &i){
    double j=std::pow(i,2.0);
    return Rcpp::wrap(j);
}

I can source the code and run the function Test.

But I want to do it in parallel using the BiocParallel::bplapply function like:

snow <- BiocParallel::SnowParam(workers = 4, type = 'SOCK', progressbar=FALSE)
BiocParallel::register(snow, default=TRUE)

BiocParallel::bplapply(X = as.list(seq_len(1000)), FUN=Test)

Then I get the following error:

Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: NULL value passed as symbol address

 

If on the other hand I register only one core (workers=1) the function runs successfully.

What may be causing the problem?

Thank you!

 

 

BiocParallel Rcpp • 2.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

'Snow' params are separate processes, so any memory allocated in the main thread is unknown to the worker thread. You need to source the code on the worker 

bplapply(as.list(seq_len(1000), FUN = function(i) {
    source("your.cpp")
    Test(i)
})

if sourcing the your cpp file is costly, then follow the Rcpp recommendations and create a package; loading the package will be a 'no-op' except for the first time

bplapply(as.list(seq_len(1000), FUN = function(i) {
    library("YourLIbrary")
    Test(i)
})
ADD COMMENT
0
Entering edit mode

Hey,

I did as you suggested. If I source the Rcpp code in the loop it works, but it takes a lot of time and I need to check if it is worth running it in parallel.

If on the other hand I place the Rcpp code in my pkg and try to run it this way I get the following error:

Error in .Call("_pkg_Testing1_Speed_fun_Rcpp", PACKAGE = "pkg",  :
  "_pkg_Testing1_Speed_fun_Rcpp" not available for .Call() for package "pkg"

 

ADD REPLY
0
Entering edit mode

I guess my answer was misleading. The C library needs to be loaded on the worker, and that means the library in which it is defined needs to be loaded, so library("YourPackage") either way. Again, the cost is once per process for each bplapply() or further amortized by

register(SnowParam())
bpstart()
bplapply(...)
...
bplapply(...)
...
bpstop()

But it could be that this is still expensive for the amount of work to be done in the loops. A final strategy would be to put the C++ code in a package without other dependencies (assuming that it doesn't make calls back into your current package), but at that point one would really want to know that the original code was written efficiently and the extra effort was worth while.

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6