I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence.
Nope. Local coordinates are _always_ w.r.t. the leftmost mapped position, whatever the strand. This is consistent with the SAM Format Specification:
All mapped segments in alignment lines are represented on the forward genomic strand. For segments that have been mapped to the reverse strand, the recorded SEQ is reverse complemented from the original unmapped sequence and CIGAR, QUAL, and strand-sensitive optional fields are reversed and thus recorded consistently with the sequence bases as represented.
In other words, local coordinates are reported with respect to the SEQ that is recorded in the SAM/BAM file. For queries that are mapped to the reverse strand, this means that local coordinates are reported with respect to the _reverse complement_ of the original unmapped sequence.
You can "revert" those local coordinates with something like this:
reverse_mapped_coordinates <- function(mapped, alignments)
{
stopifnot(is(mapped, "GRanges"), is(alignments, "GAlignments"))
alignments_hits <- mcols(mapped)$alignmentsHits
alignments2 <- alignments[alignments_hits] # parallel to 'mapped'
mapped_to_minus <- which(strand(alignments2) == "-")
bounds <- IRanges(1L, qwidth(alignments2)[mapped_to_minus])
ranges(mapped)[mapped_to_minus] <-
reflect(ranges(mapped)[mapped_to_minus], bounds)
mapped
}
Then:
library(GenomicAlignments)
x <- GRanges("T:701-710")
read1 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("+"), names="Q")
read2 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("-"), names="Q")
reads <- c(read1, read2)
mapped <- mapToAlignments(x, alignments=reads)
reverse_mapped_coordinates(mapped, reads)
# GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | xHits alignmentsHits
# <Rle> <IRanges> <Rle> | <integer> <integer>
# [1] Q 1-10 * | 1 1
# [2] Q 91-100 * | 1 2
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
Hope this helps,
H.
Do you have example code showing what you are trying to do and how it doesn't work the way you expect? Ideally a self-contained example so others can reproduce themselves?
Example alignment between query (Q) and target (T) sequences in PAF format.
paf.aln <- dplyr::tibble( q.name='Q', q.len=100, q.start=1, q.end=100, strand='*', t.name='T', t.len=1000, t.start=700, t.end=800, n.match=100, aln.len=100, mapq=60, cg='100=')
Target region to be lifted to the query
lift.gr <- as('T:701-710', 'GRanges')
PLUS
Setting GAlignments object in positive orientation (using PAF alignment)
plus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('+'), names = paf.aln$q.name)
Lifting target region to query coordinates
plus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = plus.alignment)
MINUS
Setting GAlignments object in negative orientation (using PAF alignment)
minus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('-'), names = paf.aln$q.name)
Lifting target region to query coordinates
minus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = minus.alignment)
Here both ranges lifted to query coordinates are the same position 1-10 of the query. I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence. Am I doing something wrong here?
Any help here from the developers? I don't think this is a correct behavior from the 'mapToAlignments' function. I think it is important to make this mapping in strand aware fashion to obtain correct coordinates.