I have a RleList of read coverage:
> readCoverage RleList of length 22 $chr10 integer-Rle of length 130694993 with 1966679 runs Lengths: 3100009 50 1954 50 48 3 47 3 ... 2 47 1 46 13009 50 101986 Values : 0 1 0 1 0 1 2 1 ... 2 1 2 1 0 1 0 $chr11 integer-Rle of length 122082543 with 2655111 runs Lengths: 3100225 32 18 16 16 12 24 27 ... 276 50 4029 50 3096 50 107486 Values : 0 1 3 2 3 1 2 1 ... 0 1 0 1 0 1 0 $chr12 integer-Rle of length 120129022 with 1566759 runs Lengths: 3000092 50 38 50 1230 50 18 50 ... 139 50 195 50 1 50 100839 Values : 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 $chr13 integer-Rle of length 120421639 with 1689775 runs Lengths: 3002660 50 1952 50 1090 3 47 3 ... 10 40 3207 50 608 40 102487 Values : 0 2 0 2 0 1 2 1 ... 2 1 0 1 0 1 0 $chr14 integer-Rle of length 124902244 with 1559985 runs Lengths: 3022498 50 57 10 2 38 10 2 ... 66 43 7 39 5 50 102256 Values : 0 1 0 1 3 5 4 2 ... 0 1 2 1 0 1 0 ... <17 more elements>
and a GRanges object of genomic regions:
> genomicRegions GRanges object with 6665053 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [ 1, 3000192] + [2] chr1 [3000193, 3000814] + [3] chr1 [3000815, 3001049] + [4] chr1 [3001050, 3001120] + [5] chr1 [3001121, 3001796] + ... ... ... ... [6665049] chrY [90843572, 90844088] + [6665050] chrY [90844089, 90844335] + [6665051] chrY [90844336, 90844497] + [6665052] chrY [90844498, 90844696] + [6665053] chrY [90844697, 91744698] + ------- seqinfo: 21 sequences from an unspecified genome; no seqlengths
I would like to calculate the fraction of bases in genomicRegions that have non-zero coverage from readCoverage. My end goal is to be able to create a matrix listing the fraction of bases covered for each genomic region (rows) for multiple bam files (columns).
So far I've only been able to calculate the read coverage and make a Views object of regions on just one of the chromosomes (I can't work out how to make a views object for all ranges, so I guess I'll just loop through the chromosomes):
# Calculate base pair coverage readCoverage <- coverage(bamFile) # Setup views along chromosome chromRanges <- fragmentRanges[seqnames(fragmentRanges) == "chr1"] coverageViews <- Views(readCoverage, ranges(chromRanges))
Update:
The below code works, but it takes a long time to compute the number of bases. Is there a quicker way then using viewApply?
coverageMatrix <- lapply(bamFiles, function(bamFile) { genomicViews <- Views(coverage(bamFile), as(genomicRanges, "RangesList")) basesCovered <- viewApply(genomicViews, function(x) sum(x > 0) / width(x)) }
Thank you for the reply. Options 1 and 3 work but your preferred option gives me an integer overflow error. I guess it's because I'm working with a lot of ranges:
Also all of the options throw an error if there are seqlevels in the readCoverage, which aren't in the genomicRanges. In my case genomicRanges doesn't contain any ranges on chrM, but my BAM files contains reads mapped to chrM:
It's not the number of ranges but their total width. The seqlevels problem is fairly easily fixed by just assigning the Seqinfo from the BAM file (as a BamFile object) to your GRanges.
Replacing seqinfo produces an error:
But replacing the seqlevels works: