Hello-
I'm trying to use Rsubread
's featureCounts
function in a single-cell ATAC-seq data set, to count the number of reads in each cell that map to each peak. My input is a set of peaks called by MACS2 and a BAM file that initially contained cell barcodes in the CB
tag of each read -- then I found that featureCounts
has no mechanism to use anything but the read group to separate reads, so I copied the barcodes to each read's RG
tag and added @RG
header lines for each.
featureCounts
is segfaulting when I try to read this file with byReadGroup = TRUE
:
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.8.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| aligned_hg38_rg.bam ||
|| ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : R data.frame ||
|| Dir for temp files : . ||
|| Threads : 24 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| Read reduction : to 5' end ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid8 ... ||
|| Features : 84305 ||
|| Meta-features : 84305 ||
|| Chromosomes/contigs : 280 ||
|| ||
|| Process BAM file aligned_hg38_rg.bam... ||
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: Rsubread::featureCounts(opt$bam_file, annot.ext = regions_frame, read2pos = 5, isPairedEnd = TRUE, byReadGroup = TRUE, nthreads = opt$threads)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Selecting any option then results in xmalloc: out of virtual memory
being printed on the same line as the Selection:
prompt.
The BAM file has quite a few read groups:
$ samtools view -H aligned_hg38_rg.bam | grep '^@RG' | wc -l
2278644
This would be a large matrix, but I was expecting a lot of CPU and memory usage, not a segmentation fault. I'm running on a machine with 3TB memory, and segfaults happen after a little more than 50GB allocated. I also converted to SAM and tried to read that, with the same result.
As suggested:
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_2.8.0
loaded via a namespace (and not attached):
[1] compiler_4.1.1 Matrix_1.3-4 grid_4.1.1 lattice_0.20-44
Is there anything I can do to work around this failure? (I'm actually using another package (cisTopic) which uses Rsubread::featureCounts
, so it isn't trivial to switch to htseq-count
or something like that.)
Thank you!