Dear all,
I am struggling to understand the logic of the pairwiseAlignment function when indels are located at the end of a sequence.
See the example below:
te <- pairwiseAlignment( pattern = "GGTGCCACTACCACAGCT", subject = "GGTGCCACTACCACAGCTCCT", type = "global", gapOpening=1, gapExtension=1)
There are 3 bp missing at the end of the pattern, and the alignment type is global. Accordingly, if I use:
writePairwiseAlignments(te)
I get:
P1 1 GGTGCCACTACCACAGCT--- 18
||||||||||||||||||
S1 1 GGTGCCACTACCACAGCTCCT 21
which is the expected behaviour. But if I call
deletion(te)
IRangesList of length 1
[[1]]
IRanges of length 0
I get no deletion. And no insertion either. This seems to be inconsistent. If I ask for a global alignment, and 3 bases are missing at the end of the fragment, this has to be reflected in the "deletion" function.
What am I missing here? Is this a bug? Or is pairwiseAlignment trying to be smarter than it should and this is a "feature" I do not understand? Or should I query the alignment in a different manner?
Thank you all in advance for suggestions!
Vincent
I maintain the Biostrings package but I don't want to throw these helper functions in it. They are just quick-and-dirty workarounds and are not a good solution in the long run. We need to take the time to think and come up with a better API for
pairwiseAlignment()
and for manipulating the returned objects. That will probably mean some important refactoring and I don't have plans to work on this at the moment. Maybe for BioC 3.3...Thanks,
H.