While you can iterate over an open BamFile
, extracting a certain number of reads at each iteration using the yieldSize
argument, it appears you cannot iterate over genomic regions, using the which
argument
> bf <- BamFile("tmp_sorted.bam")
> gr2 <- as(GRanges(rep("chr1", 3), IRanges(c(3000000,3005000, 3010000), c(3005000, 3010000, 3015000))), "GRangesList")
> open(bf)
> lapply(1:3, function(x) readGAlignments(bf, param = ScanBamParam(which = gr2[[x]])))
[[1]]
GAlignments object with 98 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 - 13S88M 101 3004617 3004704 88
[2] chr1 + 88M13S 101 3004617 3004704 88
[3] chr1 + 101M 101 3003681 3003781 101
[4] chr1 - 101M 101 3003791 3003891 101
[5] chr1 + 80M21S 101 3001469 3001548 80
... ... ... ... ... ... ... ...
[94] chr2 - 101M 101 4949942 4950042 101
[95] chr1 + 101M 101 3004952 3005052 101
[96] chr1 - 101M 101 3005023 3005123 101
[97] chr1 + 101M 101 3001837 3001937 101
[98] chr1 + 101M 101 3001614 3001714 101
njunc
<integer>
[1] 0
[2] 0
[3] 0
[4] 0
[5] 0
... ...
[94] 0
[95] 0
[96] 0
[97] 0
[98] 0
-------
seqinfo: 66 sequences from an unspecified genome
[[2]]
GAlignments object with 0 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
njunc
<integer>
-------
seqinfo: 66 sequences from an unspecified genome
[[3]]
GAlignments object with 0 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
njunc
<integer>
-------
seqinfo: 66 sequences from an unspecified genome
> close(bf)
But you can repeatedly open the BamFile and get chunks
> lapply(1:3, function(x) readGAlignments(bf, param = ScanBamParam(which = gr2[[x]])))
[[1]]
GAlignments object with 98 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 - 13S88M 101 3004617 3004704 88
[2] chr1 + 88M13S 101 3004617 3004704 88
[3] chr1 + 101M 101 3003681 3003781 101
[4] chr1 - 101M 101 3003791 3003891 101
[5] chr1 + 80M21S 101 3001469 3001548 80
... ... ... ... ... ... ... ...
[94] chr2 - 101M 101 4949942 4950042 101
[95] chr1 + 101M 101 3004952 3005052 101
[96] chr1 - 101M 101 3005023 3005123 101
[97] chr1 + 101M 101 3001837 3001937 101
[98] chr1 + 101M 101 3001614 3001714 101
njunc
<integer>
[1] 0
[2] 0
[3] 0
[4] 0
[5] 0
... ...
[94] 0
[95] 0
[96] 0
[97] 0
[98] 0
-------
seqinfo: 66 sequences from an unspecified genome
[[2]]
GAlignments object with 112 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 + 101M 101 3009689 3009789 101
[2] chr1 - 101M 101 3009692 3009792 101
[3] chr1 + 40M61S 101 3006254 3006293 40
[4] chr1 - 61S40M 101 3006254 3006293 40
[5] chr1 - 36S65M 101 3005668 3005732 65
... ... ... ... ... ... ... ...
[108] chr1 - 1S100M 101 3007853 3007952 100
[109] chr1 + 101M 101 3006226 3006326 101
[110] chr1 - 101M 101 3009131 3009231 101
[111] chr1 - 101M 101 3006865 3006965 101
[112] chr1 + 101M 101 3005263 3005363 101
njunc
<integer>
[1] 0
[2] 0
[3] 0
[4] 0
[5] 0
... ...
[108] 0
[109] 0
[110] 0
[111] 0
[112] 0
-------
seqinfo: 66 sequences from an unspecified genome
[[3]]
GAlignments object with 89 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 + 61M40S 101 3013078 3013138 61
[2] chr1 - 35S61M5S 101 3013078 3013138 61
[3] chr1 + 94M7S 101 3014248 3014341 94
[4] chr1 - 6S95M 101 3014247 3014341 95
[5] chr1 + 101M 101 3010502 3010602 101
... ... ... ... ... ... ... ...
[85] chr1 + 51M2I48M 101 3014701 3014799 99
[86] chr7 + 20M5I68M8S 101 124099832 124099919 88
[87] chr1 + 82M19S 101 3010739 3010820 82
[88] chr1 - 101M 101 3012152 3012252 101
[89] chr1 - 101M 101 3010107 3010207 101
njunc
<integer>
[1] 0
[2] 0
[3] 0
[4] 0
[5] 0
... ...
[85] 0
[86] 0
[87] 0
[88] 0
[89] 0
-------
seqinfo: 66 sequences from an unspecified genome
Is there nothing to be gained by reading in chunks vs repeatedly opening the BamFile
(as there presumably is when using the yieldSize argument to BamFile
)?