Entering edit mode
hi,
i have a 4Gb BAM file with RNA-seq reads aligned with STAR to the hg38 version of the human genome and where, according to STAR, an important fraction of them (~25%) aligned to multiple loci. I"m interested in finding the genes that overlap these multimapping reads to have an idea of the origin of these reads. Could anyone suggest me a Rsamtools/GenomicAlignments/GenomicFeatures route to extract these multimapped alignments and genes overlapping them?
thanks!!
robert.
Thanks a lot. I've tried out and the 'GAlignmentPairs' object 'ambiguousReads' has about 20 million pairs, however, STAR (the read mapper) tells me that there are about 9 million "reads mapped to multiple loci". Could you think of any reason responsible for this discrepancy? Maybe I'm missing some additional flag when reading the alignments?
You imported alignments whereas STAR's summary is in regard to reads. If a read can have more than one alignment, then 9 million reads can result in 20 million alignments because there is a 1:many relationship between them.
True, using the 'use.names' argument and counting unique read identifiers i get 8.7 million reads, which is similar to the number given by STAR. thanks!!