BiocParallel::bplapply() performance issue
1
1
Entering edit mode
@peterkharchenko-8755
Last seen 2.4 years ago
United States

We had to switch to using bplapply() and in the later version of the package started to encounter serious performance issues. Here's an example using low number of cores (workers):

Here's evaluation of a simple function using lapply, mclapply and bplapply with one worker: 

> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3))))
   user  system elapsed
  0.016   0.000   0.016
> require(parallel)
Loading required package: parallel
> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1))
   user  system elapsed
  0.016   0.000   0.015
> require(BiocParallel)
Loading required package: BiocParallel
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.196   0.020   9.953

bplapply time surges (proportional to the number of elements in the list).

This is using BiocParallel_1.2.22 (full sessionInfo() below). 

The problem does not occur when using an older version of BiocParallel (BiocParallel_1.0.3) :

> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.016   0.004   0.023

Also, the runtime for the newer version (1.2.22) is somehow affected by loading of other libraries ... for instance, loading mgcv library somehow doubles the runtime of that simple command:

> library(BiocParallel)
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.204   0.020   9.272
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.184   0.008  20.569

Unfortunately this effect grinds our package to a halt in some situations, so I would appreciate your input. 

Full sessionInfo() below:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] mgcv_1.8-7          nlme_3.1-122        BiocParallel_1.2.22

loaded via a namespace (and not attached):
[1] Matrix_1.2-2         futile.logger_1.4.1  lambda.r_1.1.7
[4] futile.options_1.0.0 grid_3.2.2           lattice_0.20-33

Best,

-peter.

BiocParallel • 1.9k views
ADD COMMENT
0
Entering edit mode

Thanks, obviously the performance is not satisfactory; we will look in to this.

The current Bioconductor release is 3.2, where BiocParallel is at version 1.4.1 (this does not help the performance issue, but will be the version where updates are introduced). The "Upgrading installed Bioconductor packages" instructions may help get you to the current version.

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

This is fixed in BiocParallel 1.4.3

> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3))))
   user  system elapsed 
  0.015   0.000   0.015 
>   ##  user  system elapsed
>   ## 0.016   0.000   0.016
> require(parallel)
Loading required package: parallel
> ## Loading required package: parallel
> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1))
   user  system elapsed 
  0.016   0.000   0.016 
>   ##  user  system elapsed
>   ## 0.016   0.000   0.015
> 
> require(BiocParallel)
Loading required package: BiocParallel
> system.time({
+     res0 <- bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1))
+ })
   user  system elapsed 
  0.023   0.000   0.022 
> 
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-10. For overview type 'help("mgcv-package")'.
> system.time(bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed 
  0.022   0.000   0.022 
>   ##  user  system elapsed
>   ## 0.184   0.008  20.569

A work-around in previous versions may be to explicitly set the number of tasks equal to the number of workers.

> system.time({
+     res1 <- bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1, tasks=1))
+ })
   user  system elapsed 
  0.020   0.000   0.021 

Thanks for the report.

ADD COMMENT

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6