segfault "memory not mapped" using align from Rsubread
0
0
Entering edit mode
Joseph Fass ▴ 70
@joseph-fass-2739
Last seen 7 days ago
United States

I'm getting segfaults aligning trimmed paired-end Illumina reads. I can't find anything wrong with the reads; they were trimmed using fastp, there are no reads under 15bp long (vast majority are 76bp), there are no non-[ACGT] sequence characters or offending quality characters. The 1st million read pairs align fine. The 2nd million read pairs align fine. But if I run the 1st 2 million read pairs, the segfault occurs and the last read ID reported in the BAM file is a little after the 1 millionth mark. However if I align the 1 million to 1,100,000 set of reads, that aligns fine as well. Memory never goes anywhere near the server's limit .. it's at a few % of the 256G throughout successful or unsuccessful alignments.

The error is:

[...]
||   96% completed, 2.5 mins elapsed, rate=13.0k fragments per second         ||
 *** caught segfault ***
address 0x7f00303a693a, cause 'memory not mapped'
Traceback:
 1: align(index = "gencode.rel19.pctx", readfile1 = paste(acc, "_R1.trim.fastq.gz",     sep = ""), readfile2 = paste(acc, "_R2.trim.fastq.gz", sep = ""),     output_file = paste(acc, ".align.bam", sep = ""), type = "dna"
,     minFragLength = 30, maxFragLength = 50000, detectSV = T,     sortReadsByCoordinates = F, nthreads = 2)
Possible actions:
1: abort (with core dump, if enabled)
[...]

Sometimes only one thread segfaults? and the mapping continues until the other one does as well. The command is:

align( index="gencode.rel19.pctx",
   readfile1=paste(acc,"_R1.trim.fastq.gz",sep=""),
   readfile2=paste(acc,"_R2.trim.fastq.gz",sep=""),
   output_file=paste(acc,".align.bam",sep=""),
   type="dna",
   minFragLength=30, maxFragLength=50000,
   detectSV=T,
   sortReadsByCoordinates=F,
   nthreads=2 )

The reference is all gencode protein-coding transcripts. Session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsubread_2.0.1

loaded via a namespace (and not attached):
[1] compiler_3.6.3
Rsubread align • 1.6k views
ADD COMMENT
0
Entering edit mode

From your description, it is hard to figure out what went wrong in the align() function with "detectSV = TRUE". If you can share with us the trimmed fastq files and the reference genome, it would be very helpful for us to find the bug behind it.

ADD REPLY
0
Entering edit mode

I can't share this data, unfortunately. I did try the same command with 'detectSV=F', and it succeeds much more often, but then I have a later subjunc alignment against the genome that often fails in the same way. Let me see if I can quantify that a bit more and I'll edit this reply. Should there be any problem running these alignment commands via Rscript in parallel (xargs)?

ADD REPLY
0
Entering edit mode

What do you mean by 'running these alignment commands via Rscript in parallel (xargs)'? I would think the best way to run in parallel is just to use more threads (the nthreads parameter).

ADD REPLY
0
Entering edit mode

I'm running hundreds of samples, so I've got other analysis steps plus these two alignment steps (separate R scripts, each with nthreads=2) in an xargs command.

EDIT: Both alignment steps (align and subjunc) seem to be running smoothly now that I've set 'detectSV=F' for the align command and 'complexIndels=F' for the subjunc command. These settings are acceptable for my needs, I think, but I'd still like to know why they were failing, and of course I'd imagine you'd like to know too, if the problem is within Rsubreads rather than some stupid / weird thing I'm doing with my analysis. If I get a chance, I'll try to replicate this on some data that I can share!

ADD REPLY

Login before adding your answer.

Traffic: 777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6