Entering edit mode
Hey everyone
I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :
demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {
param = BiocParallel::MulticoreParam(workers = ncores)
message("de-multiplexing the FASTQ file")
## filter and write
info <- BiocParallel::bplapply(seq_along(destinations), function(i){
split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
print(split1)
print(split2)
## open input stream
stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
on.exit(close(stream_R1))
on.exit(close(stream_R2), add = TRUE)
repeat {
fq_R1 <- ShortRead::yield(stream_R1)
fq_R2 <- ShortRead::yield(stream_R2)
if (length(fq_R1) == 0) {
break
}
id2keep <- 1:10
ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
}
return("Done!")
}, BPPARAM = param)
return("Done!")
}
This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.
Thanks in advance
My sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 yaml_2.1.14
Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..