Is there an R package that can calculate alignment statistics from bam/sam files? By alignment statistics I mean number of reads, number of reads uniquely mapped, number of reads mapped to multiple locations.
I found rsubread package can count number of reads that can be mapped. However, it doesn't differentiate between unique alignment and multiple alignment.
Thanks for the hints on countBam. However, I still cannot get the number of uniquely mapped reads.
I thought the following should return number of multiple mapped reads. However, it returns 0 when I know there are multiple mapped reads. bowtie2 shows ~14 millions reads
The quickBamFlagSummary() function in Rsamtools will display various stats about the FLAG field. For example:
> quickBamFlagSummary(bamfile)
| | | mean / max
group | nb of | nb of | records
of | records | unique | per unique
records | in group | QNAMEs | QNAME
All records................... A | 3307 | 1699 | 1.95 / 2
o template has single segment S | 0 | 0 | NA / NA
o template has multple sgmnts M | 3307 | 1699 | 1.95 / 2
- first segment.......... F | 1654 | 1654 | 1 / 1
- last segment........... L | 1653 | 1653 | 1 / 1
- other segment.......... O | 0 | 0 | NA / NA
Note that (S, M) is a partitioning of A, and (F, L, O) is a
partitioning of M. Indentation reflects this.
Details for group M:
o record is mapped......... M1 | 3271 | 1699 | 1.93 / 2
- primary alignment.... M2 | 3271 | 1699 | 1.93 / 2
- secondary alignment.. M3 | 0 | 0 | NA / NA
o record is unmapped....... M4 | 36 | 36 | 1 / 1
Details for group F:
o record is mapped......... F1 | 1641 | 1641 | 1 / 1
- primary alignment.... F2 | 1641 | 1641 | 1 / 1
- secondary alignment.. F3 | 0 | 0 | NA / NA
o record is unmapped....... F4 | 13 | 13 | 1 / 1
Details for group L:
o record is mapped......... L1 | 1630 | 1630 | 1 / 1
- primary alignment.... L2 | 1630 | 1630 | 1 / 1
- secondary alignment.. L3 | 0 | 0 | NA / NA
o record is unmapped....... L4 | 23 | 23 | 1 / 1
This output shows that the reads are paired-end. This file does not contain multiple mapped reads because the number of secondary alignments is 0. This is confirmed by the max records per unique QNAME equal to 1 for the F and L groups.
Thanks for the hints on
countBam
. However, I still cannot get the number of uniquely mapped reads.I thought the following should return number of multiple mapped reads. However, it returns 0 when I know there are multiple mapped reads. bowtie2 shows ~14 millions reads
file='input.bam'
countBam(file,index=file,param=ScanBamParam(flag=scanBamFlag(isSecondaryAlignment=TRUE))
I was able to get the number of unmapped correctly using
isUnmappedQuery=TRUE
Works for me:
Thanks for your help. I figured it out.
I ran bowtie retaining only unique mapped reads. Therefore, my bam file does not contain multiple mapped reads. Duh!
James, thanks again for your help.