Entering edit mode
Coghlan, Avril
▴
190
@coghlan-avril-3810
Last seen 10.3 years ago
Dear Bioconductor users and developers,
I am using the Biostrings library for creating alignments, especially
the pairwiseAlignment() function, which is very useful.
I would like to suggest a new function for the Biostrings library, for
printing out a pairwise alignment in blocks.
I have looked, and there doesn't seem to be such a function at present
in Biostrings or any other R library (I have tried to find one).
Here is my suggested function:
> printPairwiseAlignment <- function(alignment, chunksize)
{
seq1aln <- pattern(alignment) # Get the alignment for the first
sequence
seq2aln <- subject(alignment) # Get the alignment for the second
sequence
alnlen <- nchar(seq1aln) # Find the number of columns in the
alignment
starts <- seq(1, alnlen, by=chunksize)
n <- length(starts)
seq1alnresidues <- 0
seq2alnresidues <- 0
for (i in 1:n) {
chunkseq1aln <- substring(seq1aln, starts[i],
starts[i]+chunksize-1)
chunkseq2aln <- substring(seq2aln, starts[i],
starts[i]+chunksize-1)
# Find out how many gaps there are in chunkseq1aln:
gaps1 <- countPattern("-",chunkseq1aln) # countPattern() is
from
Biostrings library
# Find out how many gaps there have been on this line of
subject3:
gaps2 <- countPattern("-",chunkseq2aln) # countPattern() is
from
Biostrings library
# Calculate how many residues of the first sequence we have
printed so far in the alignment:
seq1alnresidues <- seq1alnresidues + chunksize - gaps1
# Calculate how many residues of the second sequence we have
printed so far in the alignment:
seq2alnresidues <- seq2alnresidues + chunksize - gaps2
print(paste(chunkseq1aln,seq1alnresidues))
print(paste(chunkseq2aln,seq2alnresidues))
print(paste(' '))
}
}
You can then use the Biostrings library to create an alignment of two
sequences, and print it out in blocks of length 'chunksize'.
For example, here is an example, for printing out an alignment in
blocks
of length 30 residues:
> s1 <- "MSHPALTQLRALRYCKEIPALDPQLLDWLLLEDSMTKRFGKTVSVTMIREGFVEQNE"
> s2 <- "MSHPALTQLRALRYFDAIPALEPHLLDWLLLEDSVTKRFEQQGKRVSVTLIREAFVGQSE"
> aln <- pairwiseAlignment(s1,s2,substitutionMatrix="BLOSUM50")
> printPairwiseAlignment(aln,20)
[1] "MSHPALTQLRALRYCKEIPALDPQLLDWLL 30"
[1] "MSHPALTQLRALRYFDAIPALEPHLLDWLL 30"
[1] " "
[1] "LEDSMTKRF---GKTVSVTMIREGFVEQNE 57"
[1] "LEDSVTKRFEQQGKRVSVTLIREAFVGQSE 60"
[1] " "
Does this seem a good idea to you all?
I am relatively new to R so I am sure that somebody else could improve
my function to make it nicer.
If you think it's a good idea to make such a function, please could
somebody add it to the Biostrings library (I don't know how to do
this)?
Regards,
Avril
Avril Coghlan
University College Cork
Ireland