Entering edit mode
Hi,
I have encountered some issues using getSeq() on a BSgenome object
inside a function parallelized with mclapply(). When calling getSeq()
from multiple threads simultaneously, at least one will hang
indefinitely using 100% CPU:
#------------------
library(GenomicRanges)
library(BSgenome.Dmelanogaster.UCSC.dm3)
gr <-
GRanges(ranges=IRanges(start=sample(seqlengths(Dmelanogaster)["chr2L"]
- 20, 10000), width=20), seqnames="chr2L", strand="+")
gr.list <- lapply(1:6, function(i) gr )
seqs.list <- mclapply(gr.list, function(gr) {
message("getSeq() started")
s <- getSeq(Dmelanogaster, gr) # does not reliably return if
mc.cores > 1
message("getSeq() returned")
s
}, mc.cores=2)
#------------------
If I instead load the BSgenome package inside the parallelized
function everything is fine:
#------------------
library(GenomicRanges)
library(BSgenome.Dmelanogaster.UCSC.dm3)
gr <-
GRanges(ranges=IRanges(start=sample(seqlengths(Dmelanogaster)["chr2L"]
- 20, 10000), width=20), seqnames="chr2L", strand="+")
detach(name="package:BSgenome.Dmelanogaster.UCSC.dm3", unload=TRUE)
gr.list <- lapply(1:6, function(i) gr )
seqs.list <- mclapply(gr.list, function(gr) {
library(BSgenome.Dmelanogaster.UCSC.dm3)
message("getSeq() started")
s <- getSeq(Dmelanogaster, gr) # always works
message("getSeq() returned")
s
}, mc.cores=2)
#------------------
I can reproduce this issue on both Mac and Linux (both 64-bit).
Is this just a limitation of BSgenome? Is there a better workaround
than making sure the package is not loaded before the call to
mclapply()?
Thanks,
Jeff Johnston
Zeitlinger Lab
Stowers Institute for Medical Research
#------------------
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.99 BSgenome_1.32.0
Biostrings_2.32.0 XVector_0.4.0
[5] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2
IRanges_1.22.8 BiocGenerics_0.10.0
[9] setwidth_1.0-3
loaded via a namespace (and not attached):
[1] bitops_1.0-6 Rsamtools_1.16.0 stats4_3.1.0 zlibbioc_1.10.0
#------------------