Entering edit mode
I would like to use Bioconductor to calculate the depth and breadth of coverage from a given BAM file (as explained here). When I use the example BAM file provided with Rsamtools
, I find the same results as when I use samtools
in the cmd-line. But when I use my own BAM file, I find different results, and I am not sure why. Is it because I should set some flags in ScanBamParam
? But which one(s)?
Here is my R code:
library(Rsamtools) f <- system.file("extdata", "ex1.bam", package="Rsamtools") h <- scanBamHeader(files=f, what="targets") (sl <- h[[f]]$targets) rl <- RangesList() rl[[names(sl)[1]]] <- IRanges(start=1, end=sl[1], names=names(sl)[1]) sbp <- ScanBamParam(which=rl) pp <- PileupParam(max_depth=10^4, min_base_quality=1, min_mapq=5, min_nucleotide_depth=1, distinguish_strands=FALSE, distinguish_nucleotides=FALSE, ignore_query_Ns=FALSE) p <- pileup(file=f, yieldSize=10^4, scanBamParam=sbp, pileupParam=pp) nrow(p) # 1569 sum(p$count) # 52146
And here is the samtools command:
cmd <- paste0("samtools depth -d ", 10^4, " -q 1 -Q 5 -r ", names(rl), " ", f, " | awk '{sum+=$3;cnt++}END{print cnt \"\t\" sum}'") system(cmd)
Do note that you are telling
PileupParam
to truncate at a depth of 10^6, but you are telling samtools to truncate at a depth of 10^4. This may or may not result in big differences in your results.@JamesW. MacDonald you're right, my bad, I'll fix the typo. Still, using 10^4 for both doesn't solve the problem.
Probably it would help to identify how the results differ, maybe by focusing on single positions and enumerating the overlapping reads.
@Martin Morgan you're right, I'll look at a few differing positions and will post my solution (if I find one ;)
I eventually looked at the link you mentioned, and wanted to suggest the
coverage()
function, withslice()
and operations likesum()
on *List and Views objects being useful. If you're having trouble finding differences between pileup and samtools, I'd isolate a region viaScanBamParam(which=GRanges(...))
and look especially at the flag field of the reads overlapping the region. Also, you can useRsamtools::filterBam()
to create a mini-bam file illustrating the discrepancy, and post that somewhere or forward to me via email martin.morgan at roswellpark.org.@Martin Morgan: thanks, I just sent you an email with a minimal, reproducible example.