indels located at the edges of subject and Biostrings pairwiseAlignment function
2
1
Entering edit mode
@vincentplagnol-8839
Last seen 8.5 years ago
United Kingdom

Dear all,

I am struggling to understand the logic of the pairwiseAlignment function when indels are located at the end of a sequence.

See the example below:

te <-  pairwiseAlignment( pattern = "GGTGCCACTACCACAGCT", subject = "GGTGCCACTACCACAGCTCCT", type = "global", gapOpening=1, gapExtension=1)

There are 3 bp missing at the end of the pattern, and the alignment type is global. Accordingly, if I use:

writePairwiseAlignments(te)

I get:

P1                 1 GGTGCCACTACCACAGCT---     18
                     ||||||||||||||||||
S1                 1 GGTGCCACTACCACAGCTCCT     21

which is the expected behaviour. But if I call

deletion(te)

IRangesList of length 1
[[1]]
IRanges of length 0

 

I get no deletion. And no insertion either. This seems to be inconsistent. If I ask for a global alignment, and 3 bases are missing at the end of the fragment, this has to be reflected in the "deletion" function.

What am I missing here? Is this a bug? Or is pairwiseAlignment trying to be smarter than it should and this is a "feature" I do not understand? Or should I query the alignment in a different manner?

Thank you all in advance for suggestions!

Vincent

 

 

pairwisealignment biostrings • 1.4k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 12 days ago
Seattle, WA, United States

Hi Vincent,

I agree. I don't think these gaps should be trimmed by default. This is known to be a source of confusion and users have been surprised and have complained about this in many occasions. Last time was about 7 weeks ago here Biostrings pairwiseAligment() output shorter than source sequence.

Note that you can easily get the sizes of the leading and trailing gaps with the following little helper functions:

patternLeadingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    start(subject(x)) - 1L
}
subjectLeadingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    start(pattern(x)) - 1L
}
patternTrailingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    width(unaligned(subject(x))) - end(subject(x))
}
subjectTrailingGap <- function(x)
{
    stopifnot(is(x, "PairwiseAlignments"), type(x) == "global")
    width(unaligned(pattern(x))) - end(pattern(x))
}

Then:

pa <- pairwiseAlignment(c("AA",  "ACCTTT", "CACATAT", "ACACGTATA"),
                        c("AAGG", "CCTTT",  "ACTA",  "CACACTATAT"))
## First see the full (i.e. non-trimmed) alignments writePairwiseAlignments(pa).

patternLeadingGap(pa)
# [1] 0 0 0 1
subjectLeadingGap(pa)
# [1] 0 1 1 0
patternTrailingGap(pa)
# [1] 2 0 0 1
subjectTrailingGap(pa)
# [1] 0 0 1 0

Or with your pattern and subject:

te <- pairwiseAlignment("GGTGCCACTACCACAGCT", "GGTGCCACTACCACAGCTCCT",
                        type="global", gapOpening=1, gapExtension=1)
patternLeadingGap(te)
# [1] 0
subjectLeadingGap(te)
# [1] 0
patternTrailingGap(te)
# [1] 3
subjectTrailingGap(te)
# [1] 0

Hope this helps,

H.

 

ADD COMMENT
0
Entering edit mode
@vincentplagnol-8839
Last seen 8.5 years ago
United Kingdom

Herve,

once again, many thanks for your incredibly helpful reply.

I guess my only question is how to bring up the topic to whoever maintains that Biostrings package? I can make do with your suggestion, but there has to be a better way.

V

 

 

 

ADD COMMENT
0
Entering edit mode

I maintain the Biostrings package but I don't want to throw these helper functions in it. They are just quick-and-dirty workarounds and are not a good solution in the long run. We need to take the time to think and come up with a better API for pairwiseAlignment() and for manipulating the returned objects. That will probably mean some important refactoring and I don't have plans to work on this at the moment. Maybe for BioC 3.3...

Thanks,

H.

ADD REPLY

Login before adding your answer.

Traffic: 375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6