Unexpected behavior in Biostrings::pairwiseAlignment
1
0
Entering edit mode
bc • 0
@bc-14863
Last seen 6.9 years ago

I hope this is the right place to post this. But I did not find any other sites to report possible bugs:

library(Biostrings)
ref <- 'GATCGATC'
seq <- ''

aln <- pairwiseAlignment(pattern = seq, subject = ref)
subject(aln) # result: ''; expected 'GATCGATC'

I is of course not common to align against an empty string, but it should still be possible to retrieve the subject.

Andreas

biostrings • 775 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 5 days ago
Seattle, WA, United States

Hi,

Note that by default pairwiseAlignments() performs global alignments so in your example the aligned subject is expected to be "--------", not "GATCGATC". Let's use a 1-letter subject:

> pa1 <- pairwiseAlignment("GATCGATC", "T")  # global
> writePairwiseAlignments(pa1)
########################################
# Program: Biostrings (version 2.46.0), a Bioconductor package
# Rundate: Tue Feb  6 11:37:54 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 8
# Identity:       0/8 (0.0%)
# Similarity:    NA/8 (NA%)
# Gaps:           7/8 (87.5%)
# Score: -43.89928
#
#
#=======================================

P1                 1 GATCGATC      8
                             
S1                 1 T-------      1

#---------------------------------------
#---------------------------------------

When doing local alignment, this is what you get:

> pa2 <- pairwiseAlignment("GATCGATC", "T", type="local")  # local
> writePairwiseAlignments(pa2)
########################################
# Program: Biostrings (version 2.46.0), a Bioconductor package
# Rundate: Tue Feb  6 11:39:18 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 1
# Identity:       1/1 (100.0%)
# Similarity:    NA/1 (NA%)
# Gaps:           0/1 (0.0%)
# Score: 1.981756
#
#
#=======================================

P1                 3 T      3
                     |
S1                 1 T      1

#---------------------------------------
#---------------------------------------

Now what is really confusing (and many people have complained about this, see https://support.bioconductor.org/p/89228) is that the pattern() and subject() accessors trim the aligned pattern and subject even in the case of a global alignment:

> pattern(pa1)
[1] G 
> subject(pa1)
[1] T 
> pattern(pa2)
[3] T 
> subject(pa2)
[1] T 

The good news is that in Bioc-devel (currently Biostrings 2.47.7), new accessors have been added that don't do this trimming when the alignment is global:

> pa1 <- pairwiseAlignment("GATCGATC", "T")
> pa1
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: GATCGATC
subject: T-------
score: -43.89928 
> alignedPattern(pa1)
  A BStringSet instance of length 1
    width seq
[1]     8 GATCGATC
> alignedSubject(pa1)
  A BStringSet instance of length 1
    width seq
[1]     8 T-------

You'll need R devel (R 3.5) if you want to use this. The long term goal is to replace the current interface for extracting information from a PairwiseAlignments object with a higher level, easier to use, and more intuitive one.

Hope this helps,

H.

 

ADD COMMENT

Login before adding your answer.

Traffic: 412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6