Hello,
I would like to point out, that when I use a BamViews object that was defined with a specific bamRanges with the summarizeOverlaps method, the whole Bam file is loaded into Memory, if I do not explicitly provide the param argument.
Here is an example
library(GenomicAlignments)
tiny_bam <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
fl <- c(tiny_bam,tiny_bam)
rngs <- GRanges(c("seq1", "seq2"), IRanges(1, c(15, 15)))
samp <- DataFrame(info=c("ex1","ex2"), row.names=c("ex1","ex2"))
# define the BamViews for multiple files using Rsamtools
view <- BamViews(bamPaths = fl, bamSamples=samp, bamRanges=rngs)
So these function calls will have different memory footprints because in one case we are loading the whole BAM file,
se <- summarizeOverlaps(view, mode=Union, ignore.strand=TRUE)
while in the other we only load the reads that are in the given ranges.
se <- summarizeOverlaps(view,
mode=Union,
ignore.strand=TRUE,
param=ScanBamParam(which = rngs))
I saw in the source code of the readGAlignments method for BamViews (https://github.com/Bioconductor/GenomicAlignments/blob/master/R/readGAlignments.R#L138-L159) that one could actually internally update the scanBamParam() by using the bamRanges() of the BamViews object, which would remove the necessity of providing the ranges a second time with param argument.
I think this would improve usability of the function and just wanted to let the developers of the very good GenomicAlignments package know.
Best,
Alex