Question

how to extract read alignments from reads aligning in mulitple locations

1

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 13 days ago

Barcelona/Universitat Pompeu Fabra

hi,

i have a 4Gb BAM file with RNA-seq reads aligned with STAR to the hg38 version of the human genome and where, according to STAR, an important fraction of them (~25%) aligned to multiple loci. I"m interested in finding the genes that overlap these multimapping reads to have an idea of the origin of these reads. Could anyone suggest me a Rsamtools/GenomicAlignments/GenomicFeatures route to extract these multimapped alignments and genes overlapping them?

thanks!!

robert.

rsamtools genomicalignments genomicfeatures • 2.9k views

ADD COMMENT • link updated 7.6 years ago by Dario Strbenac ★ 1.5k • written 7.6 years ago by Robert Castelo ★ 3.4k

score 3 · Accepted Answer · 2017-07-03

3

Entering edit mode

Dario Strbenac ★ 1.5k

@dario-strbenac-5916

Last seen 3 days ago

Australia

It's easy. For example,

library(GenomicAlignments)
ambiguousReads <- readGAlignmentPairs("/users/robert/project/example.bam", param = ScanBamParam(flag = scanBamFlag(isSecondaryAlignment = TRUE))) # Only import multi-mapping reads.

library(rtracklayer)
genesExonsTranscripts <- import("/users/robert/databases/hg38/GENCODEversion26.gtf") # GRanges with metadata.
genes <- subset(genesExonsTranscripts, type == "gene")

library(GenomicRanges)
genesAlignments <- countOverlaps(genes, ambiguousReads)
mcols(GENCODEgenes)[genesAlignments > 0, "gene_name"] # Gene symbols of genes with non-zero counts.

You may like to assign such alignments to a particular gene using RSEM.

ADD COMMENT • link 7.6 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

Thanks a lot. I've tried out and the 'GAlignmentPairs' object 'ambiguousReads' has about 20 million pairs, however, STAR (the read mapper) tells me that there are about 9 million "reads mapped to multiple loci". Could you think of any reason responsible for this discrepancy? Maybe I'm missing some additional flag when reading the alignments?

ADD REPLY • link 7.6 years ago Robert Castelo ★ 3.4k

1

Entering edit mode

You imported alignments whereas STAR's summary is in regard to reads. If a read can have more than one alignment, then 9 million reads can result in 20 million alignments because there is a 1:many relationship between them.

ADD REPLY • link 7.6 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

True, using the 'use.names' argument and counting unique read identifiers i get 8.7 million reads, which is similar to the number given by STAR. thanks!!

ADD REPLY • link 7.6 years ago Robert Castelo ★ 3.4k