readGAlignmentPairs() hanging while reading reads in specified intervals
0
0
Entering edit mode
@malcolm-perry-6958
Last seen 3.5 years ago
Cambridge

Hi, not sure if this is the place to post suspected bugs, but I've found that readGAlignmentPairs() gets stuck when reading a bam file with the intervals specified. I'm running this on a bam that is the direct output of samtools, produced by downsampling a larger bam file to 10% of reads.

Script:

library(GenomicAlignments)

sessionInfo()

bamfile = "reads.bam"

df = readr::read_tsv("peaks.bed", col_names = c("chr", "start", "end", paste0("X", 4:10)))
intervals = makeGRangesFromDataFrame(df)

ps = Rsamtools::ScanBamParam(which=intervals)

readGAlignments(bamfile)
message("Single-end entire file read OK") # runs fine in ~60s

readGAlignmentPairs(bamfile)
message("Paired-end entire file read OK") # runs with a warning about unpaired alignments in ~60s

readGAlignments(bamfile, param=ps)
message("Single-end intervals read OK") # runs fine almost instantly

readGAlignmentPairs(bamfile, param=ps)
message("Paired-end intervals read OK") # never completes

sessionInfo():

R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.26.0    Rsamtools_2.6.0            
 [3] Biostrings_2.58.0           XVector_0.30.0             
 [5] SummarizedExperiment_1.20.0 Biobase_2.50.0             
 [7] MatrixGenerics_1.2.1        matrixStats_0.58.0         
 [9] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
[11] IRanges_2.24.1              S4Vectors_0.28.1           
[13] BiocGenerics_0.36.1        

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13        zlibbioc_1.36.0        BiocParallel_1.24.1   
 [4] lattice_0.20-41        tools_4.0.5            grid_4.0.5            
 [7] crayon_1.4.1           Matrix_1.3-2           GenomeInfoDbData_1.2.4
[10] bitops_1.0-7           RCurl_1.98-1.3         DelayedArray_0.16.3   
[13] compiler_4.0.5

NB This was an entirely fresh install of both R and Bioconductor around 2 weeks ago

PS Both files are public data but the bam is still quite large (~300MB), I can provide these or test with a different example file if that would help.

genomicalig GenomicAlignments • 698 views
ADD COMMENT

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6