Entering edit mode
Hi, not sure if this is the place to post suspected bugs, but I've found that readGAlignmentPairs() gets stuck when reading a bam file with the intervals specified. I'm running this on a bam that is the direct output of samtools, produced by downsampling a larger bam file to 10% of reads.
Script:
library(GenomicAlignments)
sessionInfo()
bamfile = "reads.bam"
df = readr::read_tsv("peaks.bed", col_names = c("chr", "start", "end", paste0("X", 4:10)))
intervals = makeGRangesFromDataFrame(df)
ps = Rsamtools::ScanBamParam(which=intervals)
readGAlignments(bamfile)
message("Single-end entire file read OK") # runs fine in ~60s
readGAlignmentPairs(bamfile)
message("Paired-end entire file read OK") # runs with a warning about unpaired alignments in ~60s
readGAlignments(bamfile, param=ps)
message("Single-end intervals read OK") # runs fine almost instantly
readGAlignmentPairs(bamfile, param=ps)
message("Paired-end intervals read OK") # never completes
sessionInfo():
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicAlignments_1.26.0 Rsamtools_2.6.0
[3] Biostrings_2.58.0 XVector_0.30.0
[5] SummarizedExperiment_1.20.0 Biobase_2.50.0
[7] MatrixGenerics_1.2.1 matrixStats_0.58.0
[9] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[11] IRanges_2.24.1 S4Vectors_0.28.1
[13] BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 zlibbioc_1.36.0 BiocParallel_1.24.1
[4] lattice_0.20-41 tools_4.0.5 grid_4.0.5
[7] crayon_1.4.1 Matrix_1.3-2 GenomeInfoDbData_1.2.4
[10] bitops_1.0-7 RCurl_1.98-1.3 DelayedArray_0.16.3
[13] compiler_4.0.5
NB This was an entirely fresh install of both R and Bioconductor around 2 weeks ago
PS Both files are public data but the bam is still quite large (~300MB), I can provide these or test with a different example file if that would help.