We had to switch to using bplapply()
and in the later version of the package started to encounter serious performance issues. Here's an example using low number of cores (workers):
Here's evaluation of a simple function using lapply, mclapply and bplapply with one worker:
> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3)))) user system elapsed 0.016 0.000 0.016 > require(parallel) Loading required package: parallel > system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1)) user system elapsed 0.016 0.000 0.015 > require(BiocParallel) Loading required package: BiocParallel > system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1))) user system elapsed 0.196 0.020 9.953
bplapply time surges (proportional to the number of elements in the list).
This is using BiocParallel_1.2.22 (full sessionInfo() below).
The problem does not occur when using an older version of BiocParallel (BiocParallel_1.0.3) :
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1))) user system elapsed 0.016 0.004 0.023
Also, the runtime for the newer version (1.2.22) is somehow affected by loading of other libraries ... for instance, loading mgcv library somehow doubles the runtime of that simple command:
> library(BiocParallel) > system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1))) user system elapsed 0.204 0.020 9.272 > library(mgcv) Loading required package: nlme This is mgcv 1.8-7. For overview type 'help("mgcv-package")'. > system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1))) user system elapsed 0.184 0.008 20.569
Unfortunately this effect grinds our package to a halt in some situations, so I would appreciate your input.
Full sessionInfo() below:
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] mgcv_1.8-7 nlme_3.1-122 BiocParallel_1.2.22 loaded via a namespace (and not attached): [1] Matrix_1.2-2 futile.logger_1.4.1 lambda.r_1.1.7 [4] futile.options_1.0.0 grid_3.2.2 lattice_0.20-33
Best,
-peter.
Thanks, obviously the performance is not satisfactory; we will look in to this.
The current Bioconductor release is 3.2, where BiocParallel is at version 1.4.1 (this does not help the performance issue, but will be the version where updates are introduced). The "Upgrading installed Bioconductor packages" instructions may help get you to the current version.