Question

indels located at the edges of subject and Biostrings pairwiseAlignment function

1

Entering edit mode

vincent.plagnol ▴ 10

@vincentplagnol-8839

Last seen 8.6 years ago

United Kingdom

Dear all,

I am struggling to understand the logic of the pairwiseAlignment function when indels are located at the end of a sequence.

See the example below:

te <- pairwiseAlignment( pattern = "GGTGCCACTACCACAGCT", subject = "GGTGCCACTACCACAGCTCCT", type = "global", gapOpening=1, gapExtension=1)

There are 3 bp missing at the end of the pattern, and the alignment type is global. Accordingly, if I use:

writePairwiseAlignments(te)

I get:

P1 1 GGTGCCACTACCACAGCT--- 18
||||||||||||||||||
S1 1 GGTGCCACTACCACAGCTCCT 21

which is the expected behaviour. But if I call

deletion(te)

IRangesList of length 1
[[1]]
IRanges of length 0

I get no deletion. And no insertion either. This seems to be inconsistent. If I ask for a global alignment, and 3 bases are missing at the end of the fragment, this has to be reflected in the "deletion" function.

What am I missing here? Is this a bug? Or is pairwiseAlignment trying to be smarter than it should and this is a "feature" I do not understand? Or should I query the alignment in a different manner?

Thank you all in advance for suggestions!

Vincent

pairwisealignment biostrings • 1.4k views

ADD COMMENT • link 9.4 years ago vincent.plagnol ▴ 10

score 1 · Answer 1 · 2015-09-28

Hi Vincent,

I agree. I don't think these gaps should be trimmed by default. This is known to be a source of confusion and users have been surprised and have complained about this in many occasions. Last time was about 7 weeks ago here Biostrings pairwiseAligment() output shorter than source sequence.

Note that you can easily get the sizes of the leading and trailing gaps with the following little helper functions:

patternLeadingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    start(subject(x)) - 1L
}
subjectLeadingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    start(pattern(x)) - 1L
}
patternTrailingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    width(unaligned(subject(x))) - end(subject(x))
}
subjectTrailingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    width(unaligned(pattern(x))) - end(pattern(x))
}

Then:

pa <- pairwiseAlignment(c("AA",  "ACCTTT", "CACATAT", "ACACGTATA"),
                        c("AAGG", "CCTTT",  "ACTA",  "CACACTATAT"))
## First see the full (i.e. non-trimmed) alignments writePairwiseAlignments(pa).

patternLeadingGap(pa)
# [1] 0 0 0 1
subjectLeadingGap(pa)
# [1] 0 1 1 0
patternTrailingGap(pa)
# [1] 2 0 0 1
subjectTrailingGap(pa)
# [1] 0 0 1 0

Or with your pattern and subject:

te <- pairwiseAlignment("GGTGCCACTACCACAGCT", "GGTGCCACTACCACAGCTCCT",
                        type="global", gapOpening=1, gapExtension=1)
patternLeadingGap(te)
# [1] 0
subjectLeadingGap(te)
# [1] 0
patternTrailingGap(te)
# [1] 3
subjectTrailingGap(te)
# [1] 0

Hope this helps,

H.

score 0 · Answer 2 · 2015-10-01

0

Entering edit mode

vincent.plagnol ▴ 10

@vincentplagnol-8839

Last seen 8.6 years ago

United Kingdom

Herve,

once again, many thanks for your incredibly helpful reply.

I guess my only question is how to bring up the topic to whoever maintains that Biostrings package? I can make do with your suggestion, but there has to be a better way.

V

ADD COMMENT • link 9.4 years ago vincent.plagnol ▴ 10

0

Entering edit mode

I maintain the Biostrings package but I don't want to throw these helper functions in it. They are just quick-and-dirty workarounds and are not a good solution in the long run. We need to take the time to think and come up with a better API for pairwiseAlignment() and for manipulating the returned objects. That will probably mean some important refactoring and I don't have plans to work on this at the moment. Maybe for BioC 3.3...

Thanks,

H.

ADD REPLY • link 9.4 years ago Hervé Pagès 16k