If alignment is done allowing a read to map to multiple locations, is there an easy way to count only the uniquely mapped reads? HTSeq-count "... does not count reads that map to multiple genes.". However, for GAlignments, "... multi-reads (i.e. reads with multiple hits in the reference) won't receive any special treatment ..." meaning that the reads mapping to multiple genomic locations will be counted many times. The reason for allowing a read to map to multiple locations is to make the one of the two resulting BAM files for each sample from STAR useful for RSEM FPKM calculation but also being able to get ordinary counts of uniquely mapped reads for modelling using GLMs.
The code example should use
NH
but you have writtenNM
. Is there a coercion function that can convertscaBam
's list-of-lists data structure into aGAlignments
orGAlignmentsPairs
object quickly and concisely?Probably the easiest is to use the
ScanBamParam()
as theparam=
argument toreadGAlignments()
/readGAlignmentPairs()
. Sorry about the NM / NH typo, corrected now.I thought the easy/standard way for counting uniquely mapped reads was to filter out secondary alignments. Any reason this cannot be used here?
FWIW with Rsamtools/GenomicAlignments this can be done by setting the
isSecondaryAlignment
flag toFALSE
when loading the reads from the BAM file e.g.readGAlignments(..., param=ScanBamParam(flag=scanBamFlag(isSecondaryAlignment=FALSE)))
.