Problem with summarizeOverlaps() when reading >1 BAM file: "stop worker failed"
I recently started working with RNAseq data. I used the code below to try to read 2-4 BAM files (BAM and BAI in the same directory, etc) but I repeatedly get the following error when running summarizeOverlaps():

Error: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

One other time I got this error (with the same code):

Error: 'bplapply' receive data failed:
  error reading from connection

The BAM files are from ~40M single-end 75bp reads, each ~2-2.5Gb (aligned using tophat2/bowtie2; hg19 reference genome). Code, sessionInfo(), and last lines from traceback() are below (of note, this works just fine if I try to do just one BAM file):

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> grl <- exonsBy(txdb, by="gene")
> bamLst
  BamFileList of length 4
  names(4): file1.bam file2.bam file3.bam file4.bam
> experiment2 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T)
  Error: stop worker failed:
    'clear_cluster' receive data failed:
    reached elapsed time limit

> traceback()  
16: stop(.error_worker_comm(e, "stop worker failed"))  
15: value[[3L]](cond)  
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])  
13: tryCatchList(expr, classes, parentenv, handlers)  

> sessionInfo()  
R version 3.3.1 (2016-06-21)  
Platform: x86_64-apple-darwin13.4.0 (64-bit)  
Running under: OS X 10.11.4 (El Capitan)  
attached base packages:  
[1] stats4  parallel  stats  graphics  grDevices utils  datasets  methods   base  
other attached packages:  
 [1] GenomicAlignments_1.8.3 Rsamtools_1.24.0           Biostrings_2.40.2  
 [4] XVector_0.12.0          SummarizedExperiment_1.2.3 Biobase_2.32.0  
 [7] GenomicRanges_1.24.2    GenomeInfoDb_1.8.1         IRanges_2.6.1  
[10] S4Vectors_0.10.1        BiocGenerics_0.18.0   


It seems to me like this may be related to either computer memory (8Gb), cores (4), or something like that. Beyond using a more powerful computer, is there any way to fix (or circumvent) this??

Update: Seems like indeed this was related to computing power (memory, cores, or something). I tried with smaller files and it worked. So I added a yieldSize parameter to BamFileList when creating "bamLst", to limit the number of reads scanned from the file at one time:

bamLst <- BamFileList(files1, yieldSize=7500000)

Seems like problem fixed, although I wonder if it makes things run slower too. If anyone has any other suggestions, let me know!!

I would expect yieldSize of > 100000 to be ok for speed. You could process in serial with


or perhaps see the Rsubread::featureCounts() or bamsignals packages.

Will definitely try those --thanks!


