mapToAlignments_issue
1
0
Entering edit mode
@davidporubsky-15691
Last seen 22 months ago
United States

Dear Bioc-team,

I'm trying to map genome coordinates back to local (contig/read) coordinates, however, I have an issue to do this properly in cases when alignment between contig and the genome (reference) is in minus (reverse) orientation. Is there a way to use function GenomicAlignments::mapToAlignments in strand aware fashion such that reverse alignments will be mapped in minus orientation?

Is there an example code on how to do this properly?

Thank you in advance,

David

GenomicAlignments • 1.2k views
ADD COMMENT
1
Entering edit mode

Do you have example code showing what you are trying to do and how it doesn't work the way you expect? Ideally a self-contained example so others can reproduce themselves?

ADD REPLY
0
Entering edit mode

Example alignment between query (Q) and target (T) sequences in PAF format.

paf.aln <- dplyr::tibble( q.name='Q', q.len=100, q.start=1, q.end=100, strand='*', t.name='T', t.len=1000, t.start=700, t.end=800, n.match=100, aln.len=100, mapq=60, cg='100=')

Target region to be lifted to the query

lift.gr <- as('T:701-710', 'GRanges')

PLUS

Setting GAlignments object in positive orientation (using PAF alignment)

plus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('+'), names = paf.aln$q.name)

Lifting target region to query coordinates

plus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = plus.alignment)

MINUS

Setting GAlignments object in negative orientation (using PAF alignment)

minus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('-'), names = paf.aln$q.name)

Lifting target region to query coordinates

minus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = minus.alignment)

Here both ranges lifted to query coordinates are the same position 1-10 of the query. I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence. Am I doing something wrong here?

ADD REPLY
0
Entering edit mode

Any help here from the developers? I don't think this is a correct behavior from the 'mapToAlignments' function. I think it is important to make this mapping in strand aware fashion to obtain correct coordinates.

ADD REPLY
2
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States

I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence.

Nope. Local coordinates are _always_ w.r.t. the leftmost mapped position, whatever the strand. This is consistent with the SAM Format Specification:

All mapped segments in alignment lines are represented on the forward genomic strand. For segments that have been mapped to the reverse strand, the recorded SEQ is reverse complemented from the original unmapped sequence and CIGAR, QUAL, and strand-sensitive optional fields are reversed and thus recorded consistently with the sequence bases as represented.

In other words, local coordinates are reported with respect to the SEQ that is recorded in the SAM/BAM file. For queries that are mapped to the reverse strand, this means that local coordinates are reported with respect to the _reverse complement_ of the original unmapped sequence.

You can "revert" those local coordinates with something like this:

reverse_mapped_coordinates <- function(mapped, alignments)
{
    stopifnot(is(mapped, "GRanges"), is(alignments, "GAlignments"))
    alignments_hits <- mcols(mapped)$alignmentsHits
    alignments2 <- alignments[alignments_hits]  # parallel to 'mapped'
    mapped_to_minus <- which(strand(alignments2) == "-")
    bounds <- IRanges(1L, qwidth(alignments2)[mapped_to_minus])
    ranges(mapped)[mapped_to_minus] <-
        reflect(ranges(mapped)[mapped_to_minus], bounds)
    mapped
}

Then:

library(GenomicAlignments)

x <- GRanges("T:701-710")

read1 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("+"), names="Q")
read2 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("-"), names="Q")
reads <- c(read1, read2)

mapped <- mapToAlignments(x, alignments=reads)

reverse_mapped_coordinates(mapped, reads)
# GRanges object with 2 ranges and 2 metadata columns:
#       seqnames    ranges strand |     xHits alignmentsHits
#          <Rle> <IRanges>  <Rle> | <integer>      <integer>
#   [1]        Q      1-10      * |         1              1
#   [2]        Q    91-100      * |         1              2
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Hope this helps,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6