Hello everyone,
I am trying to retrieve uniquely mapped reads from a BAM file (paired-end) mapped with BWA MEM, with the scanBam function from Rsamtools:
# setting isSecondaryAlignment=FALSE
param <- ScanBamParam(flag=scanBamFlag(isUnmappedQuery=FALSE, isSecondaryAlignment=FALSE), what=c("qname", "pos", "qwidth", "rname"))
# scanning BAM
readsfile <- "test.bam"
aln <- scanBam(readsfile, param=param)[[1]]
The problem is that there remain some secondary alignments:
# testing whether some read IDs are present more than twice (because it is paired-end data, we expect them to be present exactly 2 times if only uniquely mapped reads remain):
any(table(aln$qname) > 2)
# TRUE
I read that I could also use the "mapqFilter" in ScanBamParam, but I also read that it might not be the most reliable way to proceed. Also, I am sure which MAPQ filter to use with BWA MEM...
The following is working with samtools (outside of R), which I found in this post:
samtools view -F 2048 -bo filtered.bam original.bam
Is there a way to do the same think with Rsamtools, i.e. using the appropriate SAM flag (2048 corresponds to any secondary alignment).
Either that, or someone can advice a robust alternative solution to this problem (or point me to something I might be doing wrong)?
Thank you!
Thank you, Robert! So if I want to retrieve uniquely mapped reads only, it is correct to set:
isUnmappedQuery=FALSE
+isSupplementaryAlignment=FALSE
+isSecondaryAlignment=FALSE
(I also want to keep those reads that don't map as a proper pair)?I am still a bit confused with the secondary versus supplementary alignments: do supplementary alignments include secondary alignments?
In case of keeping only uniquely mapped reads, I suppose it is correct to remove both.
Well, how to retrieve uniquely mapped reads only, using BWA, is more tricky than it sounds, and I guess the most authoritative answer to date is this one by Heng Li, the author of BWA, where, in essence, he argues that the concept itself of uniquely mapable read is difficult to define and if you can't define it precisely, then it becomes even harder to measure it and recommends using mapping qualities. I'm not a BWA expert user, but Googling a bit you can find the following relevant Biostars hits for you to read about it here, here and here.
The difference between secondary and supplementary alignments you can find it described in the SAM specifications here and essentially, while a secondary alignment is an alignment of a read that, according to the read mapper, maps to a secondary location in the genome, a supplementary alignment is a part of a read that maps to a location of the genome distant to the other part, as a result of, for instance, some structural variation. If you want to work with primary alignments, then yes, I'd suggest to set those flags you mentioned to
FALSE
.Thanks again! I found the same posts, it's all a little confusing. Nevertheless, I think those flags will do just find for the time being, for the analysis I want to perform in R. Cheers