different results between Rsamtools::pileup and samtools depth
0
1
Entering edit mode
@timotheeflutre-6727
Last seen 5.6 years ago
France

I would like to use Bioconductor to calculate the depth and breadth of coverage from a given BAM file (as explained here). When I use the example BAM file provided with Rsamtools, I find the same results as when I use samtools in the cmd-line. But when I use my own BAM file, I find different results, and I am not sure why. Is it because I should set some flags in ScanBamParam? But which one(s)?

Here is my R code:

library(Rsamtools)
f <- system.file("extdata", "ex1.bam", package="Rsamtools")
h <- scanBamHeader(files=f, what="targets")
(sl <- h[[f]]$targets)
rl <- RangesList()
rl[[names(sl)[1]]] <- IRanges(start=1, end=sl[1], names=names(sl)[1])
sbp <- ScanBamParam(which=rl)
pp <- PileupParam(max_depth=10^4,
                  min_base_quality=1,
                  min_mapq=5,
                  min_nucleotide_depth=1,
                  distinguish_strands=FALSE,
                  distinguish_nucleotides=FALSE,
                  ignore_query_Ns=FALSE)
p <- pileup(file=f, yieldSize=10^4, scanBamParam=sbp, pileupParam=pp)
nrow(p) # 1569
sum(p$count) # 52146

And here is the samtools command:

cmd <- paste0("samtools depth -d ", 10^4, " -q 1 -Q 5 -r ", names(rl), " ", f,
              " | awk '{sum+=$3;cnt++}END{print cnt \"\t\" sum}'")
system(cmd)

 

rsamtools pileup • 2.7k views
ADD COMMENT
0
Entering edit mode

Do note that you are telling PileupParam to truncate at a depth of 10^6, but you are telling samtools to truncate at a depth of 10^4. This may or may not result in big differences in your results.

ADD REPLY
0
Entering edit mode

@JamesW. MacDonald you're right, my bad, I'll fix the typo. Still, using 10^4 for both doesn't solve the problem.

ADD REPLY
0
Entering edit mode

Probably it would help to identify how the results differ, maybe by focusing on single positions and enumerating the overlapping reads.

ADD REPLY
0
Entering edit mode

@Martin Morgan you're right, I'll look at a few differing positions and will post my solution (if I find one ;)

ADD REPLY
0
Entering edit mode

I eventually looked at the link you mentioned, and wanted to suggest the coverage() function, with slice() and operations like sum() on *List and Views objects being useful. If you're having trouble finding differences between pileup and samtools, I'd isolate a region via ScanBamParam(which=GRanges(...)) and look especially at the flag field of the reads overlapping the region. Also, you can use Rsamtools::filterBam() to create a mini-bam file illustrating the discrepancy, and post that somewhere or forward to me via email martin.morgan at roswellpark.org.

ADD REPLY
0
Entering edit mode

@Martin Morgan: thanks, I just sent you an email with a minimal, reproducible example.

ADD REPLY

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6