Dear all,
I do not understand the behaviour of pairwise alignment in the presence of multiple alignments with equal scores. Precisely, see the example at the bottom of this email (same DNA sequence but cut at different locations). In one case the mismatch is placed after the deletion event, and in the other the mismatch is placed after the deletion. And it's really not obvious that these are really the same mutation events based on the output of pairwiseAlignment.
I appreciate that the score is the same but the manual states that
"If more than one pairwise alignment produces the maximum alignment score, then the alignment with the smallest initial deletion whose mismatches occur before its insertions and deletions is chosen."
which should favour, I think, the second option (mismatch occurring before the deletion). So why is the first option chosen? It seems unclear to me. Any rationale on why that choice is made and ideally how to make the behaviour consistent would be much appreciated.
Thank you in advance,
Vincent
> pairwiseAlignment(pattern = "ATCAAGGAACCATCTCCGAAAGCCAACAAGGAAATCCTCGATGTGAG", subject = "ATCAAGGAATTAAGAGAAGCAACATCTCCGAAAGCCAACAAGGAAATCCTCGATGTGAG") Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [1] ATCAAGGA------------ACCATCTCCGAAAGCCAACAAGGAAATCCTCGATGTGAG subject: [1] ATCAAGGAATTAAGAGAAGCAACATCTCCGAAAGCCAACAAGGAAATCCTCGATGTGAG score: 27.26148 > pairwiseAlignment(pattern = "TCTCTGTCATAGGGACTCTGGATCCCAGAAGGTGAGAAAGTTAAAATTCCCGTCGCTATCAAGGAACCA", subject = "TCTCTGTCATAGGGACTCTGGATCCCAGAAGGTGAGAAAGTTAAAATTCCCGTCGCTATCAAGGAATTAAGAGAAGCAACA") Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [1] TCTCTGTCATAGGGACTCTGGATCCCAGAAGGTGA...TCCCGTCGCTATCAAGGAAC------------CA subject: [1] TCTCTGTCATAGGGACTCTGGATCCCAGAAGGTGA...TCCCGTCGCTATCAAGGAATTAAGAGAAGCAACA score: 70.86008