Question

Rsubread alignment crashes at 35%, valgrind indicates invalid writes

0

Entering edit mode

venusian.martin • 0

@c9790f70

Last seen 5 months ago

The Netherlands

Hello,

I am trying to analyse a ~200GB, ~20 sample dataset of RNAseq reads. I am trying to use Rsubread to process the raw reads into a count matrix. Indexing the reference genome runs without issue. However, I have been unable to succesfully run any alignment and am stuck at this point.

My function call:

cpu_num = "32"
align(index = index_basename,
      readfile1 = readfiles,
      input_format = "gzFASTA",
      nthreads = cpu_num,
      output_file = outfile_names)

align() gets through 32-37% of the first sample and then crashes with the error message:

free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)

Because this looks like a c memory issue I have tried investigating the issue with valgrind memcheck. Inside this log I find the following:

==24970== Invalid write of size 1
==24970==    at 0x4838DC8: strcpy (vg_replace_strmem.c:512)
==24970==    by 0x11CD989B: strcpy (string_fortified.h:90)
==24970==    by 0x11CD989B: geinput_next_read_trim (input-files.c:921)
==24970==    by 0x11CB5686: fetch_next_read_pair (core.c:1136)
==24970==    by 0x11CB8903: do_voting (core.c:3097)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970==  Address 0x3e761fe2b is 0 bytes after a block of size 1,211 alloc'd
==24970==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==24970==    by 0x11CB86F3: do_voting (core.c:3062)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970== 
==24970== Invalid read of size 1
==24970==    at 0x11CDA265: genekey2int (input-files.c:1241)
==24970==    by 0x11CB8BEB: do_voting (core.c:3175)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970==  Address 0x3e761fe2b is 0 bytes after a block of size 1,211 alloc'd
==24970==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==24970==    by 0x11CB86F3: do_voting (core.c:3062)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970== 
==24970== Invalid read of size 1
==24970==    at 0x11CD946C: reverse_read (input-files.c:1176)
==24970==    by 0x11CB8FE6: do_voting (core.c:3231)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970==  Address 0x3e761fe6f is 1 bytes before a block of size 1,211 alloc'd
==24970==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==24970==    by 0x11CB8704: do_voting (core.c:3063)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970== 
==24970== Invalid write of size 1
==24970==    at 0x11CD9475: reverse_read (input-files.c:1178)
==24970==    by 0x11CB8FE6: do_voting (core.c:3231)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970==  Address 0x3e761fe6f is 1 bytes before a block of size 1,211 alloc'd
==24970==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==24970==    by 0x11CB8704: do_voting (core.c:3063)
==24970==    by 0x11CBEDC2: run_in_thread (core.c:3367)
==24970==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==24970==    by 0x4E0406E: clone (clone.S:95)
==24970== 

valgrind: m_mallocfree.c:307 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 1280, hi = 4846791653451907907.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.

This behaviour is the same no matter what I try:

running the R script via sbatch directly over our SLURM cluster
running the R script over SLURM in an Rstudio session via rslurm
using a different sample to rule out file corruption
using untrimmed data to rule out an error caused by cutadapt

Do these write errors look familiar? What can I do to get Rsubread running?

> sessionInfo( )
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so;  LAPACK version 3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rslurm_0.6.2    stringr_1.5.1   Rsubread_2.18.0

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5    lattice_0.22-5  magrittr_2.0.3  glue_1.6.2      parallel_4.4.1  lifecycle_1.0.4 cli_3.6.1      
 [8] grid_4.4.1      vctrs_0.6.4     compiler_4.4.1  rstudioapi_0.13 tools_4.4.1     whisker_0.4.1   rlang_1.1.2    
[15] stringi_1.8.2

Rsubread • 983 views

ADD COMMENT • link updated 6 months ago by Yang Liao ▴ 450 • written 7 months ago by venusian.martin • 0

0

Entering edit mode

Your command line indicates that the input files are gzipped FASTA files. Can you share with us the index and the gzipped FASTA file that caused the problem?

The valgrind output could suggest that some reads are long in the input. Are there reads longer than 1200bp?

ADD REPLY • link 7 months ago Yang Liao ▴ 450

0

Entering edit mode

EDIT: I managed to find the documentation where sublong is noted :)

When I use sublong with the same inputs as in my OP, I get this error:

Error: C stack usage  887530447692 is too close to the limit
Execution halted
Error: C stack usage  883936731980 is too close to the limit
Fatal error: error during cleanup

Error: segfault from C stack overflow
slurmstepd-cl-node01: error: get_exit_code task 0 died by signal

I am repeating this while running with valgrind and will update my post once I have something.

I would need to get authorisation to share the data so I can't do that right now, but I can get back to you if it's necessary.

Thank you for your help interpreting valgrind! Yes, there are reads longer than 1200bp. I have seen on your website a program named Sublong but can't find more information about it.

Should I use Sublong instead? How can I do that?

Thank you!

ADD REPLY • link 7 months ago venusian.martin • 0

0

Entering edit mode

Thank you for the additional information! We are currently developing a significantly improved version of Sublong. The older version may not function optimally.

To ensure our new sublong function work correctly, we are still hoping to have the input files for testing, hence we can fix this bug in it. Thank you for the efforts to share them!

ADD REPLY • link 7 months ago Yang Liao ▴ 450

0

Entering edit mode

Hello,

I was OOF and am getting back to this project now. Below I will paste the output of valgrind for the call to sublong generating the error.

Do you have a suggestion for paramaters within which sublong can run? At the moment it seems like I won't be able to use Rsubread on this dataset which would be a shame. Does the command line implementation have the same restrictions? In other words do you recommend I try running Subread installed seperately?

I will email you privately about sharing my input data if that's okay.

Now follows the valgrind output:

==42076== Memcheck, a memory error detector
==42076== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==42076== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==42076== Command: /usr/lib/R/bin/exec/R --no-echo --no-restore --vanilla --file=slurm_run.R
==42076== 
==42076== Warning: set address range perms: large range [0x59c8a040, 0xded92468) (undefined)
==42076== Warning: set address range perms: large range [0x61dd93040, 0x64a6041ba) (undefined)
==42076== Warning: set address range perms: large range [0x61dd95038, 0x64a603038) (defined)
==42076== Thread 12:
==42076== Conditional jump or move depends on uninitialised value(s)
==42076==    at 0x11C78759: LRMdynamic_in_middle (LRMchro-event.c:509)
==42076==    by 0x11CECAC0: LRMfill_gaps (longread-mapping.c:1102)
==42076==    by 0x11CECD91: LRMdo_dynamic_programming_read (longread-mapping.c:1166)
==42076==    by 0x11CECE1B: LRMchunk_read_iteration (longread-mapping.c:1184)
==42076==    by 0x11CECEE0: LRM_thread_runner (longread-mapping.c:354)
==42076==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==42076==    by 0x4E0406E: clone (clone.S:95)
==42076== 
==42076== Thread 8:
==42076== Conditional jump or move depends on uninitialised value(s)
==42076==    at 0x11C7877A: LRMdynamic_in_middle (LRMchro-event.c:510)
==42076==    by 0x11CECAC0: LRMfill_gaps (longread-mapping.c:1102)
==42076==    by 0x11CECD91: LRMdo_dynamic_programming_read (longread-mapping.c:1166)
==42076==    by 0x11CECE1B: LRMchunk_read_iteration (longread-mapping.c:1184)
==42076==    by 0x11CECEE0: LRM_thread_runner (longread-mapping.c:354)
==42076==    by 0x4CF1FA2: start_thread (pthread_create.c:486)
==42076==    by 0x4E0406E: clone (clone.S:95)
==42076== 
Error: C stack usage  110137406284 is too close to the limit
Execution halted
==42076== 
==42076== HEAP SUMMARY:
==42076==     in use at exit: 17,335,222,223 bytes in 186,076,219 blocks
==42076==   total heap usage: 885,359,441 allocs, 699,283,222 frees, 137,634,406,129 bytes allocated
==42076== 
==42076== LEAK SUMMARY:
==42076==    definitely lost: 72 bytes in 1 blocks
==42076==    indirectly lost: 8,000 bytes in 8 blocks
==42076==      possibly lost: 6,058 bytes in 18 blocks
==42076==    still reachable: 17,335,208,093 bytes in 186,076,192 blocks
==42076==                       of which reachable via heuristic:
==42076==                         newarray           : 4,264 bytes in 1 blocks
==42076==         suppressed: 0 bytes in 0 blocks
==42076== Rerun with --leak-check=full to see details of leaked memory
==42076== 
==42076== For counts of detected and suppressed errors, rerun with: -v
==42076== Use --track-origins=yes to see where uninitialised values come from
==42076== ERROR SUMMARY: 2292288 errors from 2 contexts (suppressed: 0 from 0)

ADD REPLY • link 7 months ago venusian.martin • 0

0

Entering edit mode

Thanks for taking the time to report the bug! The provided Valgrind report is very helpful and suggests the issue lies within the banded dynamic programming function. We'll investigate this further to get things fixed.

We're happy to hear the dataset is available for sharing! To make things convenient, you can share the file with yang.liao@monash.edu. This data will be very helpful for testing and improving our program.

ADD REPLY • link 6 months ago Yang Liao ▴ 450