List all alignments in Biostrings
1
0
Entering edit mode
Alpesh Querer ▴ 220
@alpesh-querer-4895
Last seen 10.1 years ago
Hello, I understand the pairwiseAlignment() returns the (first) best match only, is there a way for it to return all alignments for 2 sequences both local and global? Thanks, Al [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi Al, On 03/25/2014 09:31 AM, Alpesh Querer wrote: > Hello, > > I understand the pairwiseAlignment() returns the (first) best match only, > is there a way for it to return all alignments for 2 sequences both local > and global? Short answer is no. Long answer: I'm not sure what you'd like to get in case of a global alignment ("global" is short for "global-global"). If you're doing "global-local" alignment, note that pairwiseAlignment() returns the *last* best match, not the first: pattern <- DNAString("AAA") subject <- DNAString("TTAAACGTAAAATGAAT") Then: > pairwiseAlignment(pattern, subject, type="global-local") Global-Local PairwiseAlignmentsSingleSubject (1 of 1) pattern: [1] AAA subject: [10] AAA score: 5.945268 You can use matchPattern() to get all the matches that satisfy some criteria (but this criteria must be expressed in terms of edit distance): > matchPattern(pattern, subject, max.mismatch=1, with.indels=TRUE) Views on a 17-letter DNAString subject subject: TTAAACGTAAAATGAAT views: start end width [1] 3 5 3 [AAA] [2] 9 11 3 [AAA] [3] 10 12 3 [AAA] [4] 15 16 2 [AA] This doesn't give you a score though nor does it guarantee that you'll get a match. However, if the returned Views object is not empty, then it's guaranteed to contain all the "best" matches. Even though the Views object can be post-processed to keep the best matches only, there is no way to ask matchPattern() something like "give me the best matches only". HTH, H. > > Thanks, > Al > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thank you Hervé . so if I want to determine to determine number of local alignments between two strings no less than length 4, is that possible in biostrings? The alternative is create sliding window fragments from one sequence and use matchPattern ? subject1: TTAAACGTAAAATGAATAAGCCT subject:2 GGAAACGTGAATTGAATAAGCG On Tue, Mar 25, 2014 at 1:06 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Al, > > > On 03/25/2014 09:31 AM, Alpesh Querer wrote: > >> Hello, >> >> I understand the pairwiseAlignment() returns the (first) best match only, >> is there a way for it to return all alignments for 2 sequences both local >> and global? >> > > Short answer is no. > > Long answer: I'm not sure what you'd like to get in case of a global > alignment ("global" is short for "global-global"). > If you're doing "global-local" alignment, note that pairwiseAlignment() > returns the *last* best match, not the first: > > pattern <- DNAString("AAA") > subject <- DNAString("TTAAACGTAAAATGAAT") > > Then: > > > pairwiseAlignment(pattern, subject, type="global-local") > Global-Local PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [1] AAA > subject: [10] AAA > score: 5.945268 > > You can use matchPattern() to get all the matches that satisfy > some criteria (but this criteria must be expressed in terms of > edit distance): > > > matchPattern(pattern, subject, max.mismatch=1, with.indels=TRUE) > Views on a 17-letter DNAString subject > subject: TTAAACGTAAAATGAAT > views: > start end width > [1] 3 5 3 [AAA] > [2] 9 11 3 [AAA] > [3] 10 12 3 [AAA] > [4] 15 16 2 [AA] > > This doesn't give you a score though nor does it guarantee that you'll > get a match. However, if the returned Views object is not empty, then > it's guaranteed to contain all the "best" matches. Even though the Views > object can be post-processed to keep the best matches only, there is no > way to ask matchPattern() something like "give me the best matches > only". > > HTH, > H. > > >> Thanks, >> Al >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 03/25/2014 11:24 AM, Alpesh Querer wrote: > Thank you Herv? . so if I want to determine to determine number of > local alignments between two strings no less than length 4, is that > possible in biostrings? By "local alignments no less than length 4" do you mean "common substrings of length >= 4"? > The alternative is create sliding window fragments from one sequence > and use matchPattern ? > subject1: TTAAACGTAAAATGAATAAGCCT > subject:2 GGAAACGTGAATTGAATAAGCG Yes you could try doing something like that. Create the views (aka sliding window) on the first sequence: s1 <- DNAString("TTAAACGTAAAATGAATAAGCCT") s1views <- Views(s1, IRanges(seq_len(length(s1)-3), width=4)) Then use matchPDict() instead of matchPattern(). This will be faster than calling matchPattern() on each view: m <- matchPDict(PDict(s1views), s2) start_in_s1 <- rep(start(s1views), elementLengths(m)) start_in_s2 <- unlist(start(m)) seq <- as.character(s1views[start_in_s1]) hits0 <- data.frame(start_in_s1, start_in_s2, width=4L, seq) Then: > hits0 start_in_s1 start_in_s2 width seq 1 3 3 4 AAAC 2 4 4 4 AACG 3 5 5 4 ACGT 4 13 8 4 TGAA 5 13 13 4 TGAA 6 14 9 4 GAAT 7 14 14 4 GAAT 8 15 15 4 AATA 9 16 16 4 ATAA 10 17 17 4 TAAG 11 18 18 4 AAGC Hits that are at consecutive positions in both 'start_in_s1' and 'start_in_s2' should probably be merged in a single longer hit (they clearly correspond to the same hit). This can be done with something like: mergeHits <- function(hits) { m <- IRanges:::matchIntegerPairs(hits$start_in_s1 - 1L, hits$start_in_s2 - 1L, hits$start_in_s1, hits$start_in_s2) merge_idx <- which(!is.na(m)) if (length(merge_idx) == 0L) return(hits) ## This for loop can be very slow if there are a lot of hits ## to merge. There must be a better way! for (i in merge_idx) { mi <- m[i] mj <- m[mi] if (!is.na(mj)) m[i] <- mj } hits$width <- hits$width + tabulate(m, nbins=length(m)) hits[-merge_idx, c("start_in_s1", "start_in_s2", "width")] } hits <- mergeHits(hits0) Then: > hits start_in_s1 start_in_s2 width 1 3 3 6 4 13 8 5 5 13 13 9 Putting back the hit sequences (they can be extracted either from s1 or s2, it doesn't matter, that should give us exactly the same sequences): hits$seq <- as.character(Views(s1, IRanges(hits$start_in_s1, width=hits$width))) Then: > hits start_in_s1 start_in_s2 width seq 1 3 3 6 AAACGT 4 13 8 5 TGAAT 5 13 13 9 TGAATAAGC HTH, H. > > > On Tue, Mar 25, 2014 at 1:06 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Al, > > > On 03/25/2014 09:31 AM, Alpesh Querer wrote: > > Hello, > > I understand the pairwiseAlignment() returns the (first) best > match only, > is there a way for it to return all alignments for 2 sequences > both local > and global? > > > Short answer is no. > > Long answer: I'm not sure what you'd like to get in case of a global > alignment ("global" is short for "global-global"). > If you're doing "global-local" alignment, note that pairwiseAlignment() > returns the *last* best match, not the first: > > pattern <- DNAString("AAA") > subject <- DNAString("TTAAACGTAAAATGAAT") > > Then: > > > pairwiseAlignment(pattern, subject, type="global-local") > Global-Local PairwiseAlignmentsSingleSubjec__t (1 of 1) > pattern: [1] AAA > subject: [10] AAA > score: 5.945268 > > You can use matchPattern() to get all the matches that satisfy > some criteria (but this criteria must be expressed in terms of > edit distance): > > > matchPattern(pattern, subject, max.mismatch=1, with.indels=TRUE) > Views on a 17-letter DNAString subject > subject: TTAAACGTAAAATGAAT > views: > start end width > [1] 3 5 3 [AAA] > [2] 9 11 3 [AAA] > [3] 10 12 3 [AAA] > [4] 15 16 2 [AA] > > This doesn't give you a score though nor does it guarantee that you'll > get a match. However, if the returned Views object is not empty, then > it's guaranteed to contain all the "best" matches. Even though the Views > object can be post-processed to keep the best matches only, there is no > way to ask matchPattern() something like "give me the best matches > only". > > HTH, > H. > > > Thanks, > Al > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 325 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6