I stumbled across some strange behaviour of bplapply
when a large data object is passed to FUN
and an error is also raised within FUN
:
library(BiocParallel) bigmat <- matrix(rnorm(1e7), nrow=50) .recount_cells <- function(x, incoming) { stop("YAY") return(x) } system.time(bplapply(1:2, FUN=.recount_cells, incoming=bigmat, BPPARAM=SerialParam())) # Finishes instantly system.time(bplapply(1:2, FUN=.recount_cells, incoming=bigmat, BPPARAM=MulticoreParam(2))) # Killed after several minutes
This doesn't happen if I set bigmat <- 1
, nor if I remove the stop
call in .recount_cells
. In those situations, both of the bplapply
calls above execute in a timely manner.
I have encountered related problems with bplapply
stalling even when there is no error being raised in FUN
, but the above example is the simplest to reproduce on my system (Ubuntu, below). The same behaviour is observed with BiocParallel 1.7.5 and on a Mac OSX.
Anyway, here's my sessionInfo()
:
R version 3.3.0 Patched (2016-05-03 r70580) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocParallel_1.6.3 loaded via a namespace (and not attached): [1] parallel_3.3.0
I don't yet have a solution but the problem can be seen in this non-parallel code
which gets slower as the data get larger.