bug in Biostrings mismatchTable?
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi there, I think I've found a bug in mismatchTable (Biostrings). It's reporting a mismatch after the end of the reported alignment. I think the code below shows the problem. thanks, as usual! Janet ##### library(Biostrings) ### couple of seqs, the middle portion aligns, but the last few bases don't. I'm not interested in those last few bases, so I do a local alignment seq1 <- DNAString("GCTGAAGTAGTTCTCCAGAA") seq2 <- DNAString("GTAGTTCTCCAAAGT") aln1 <- pairwiseAlignment ( seq1, seq2, type="local" ) aln1 # Local PairwiseAlignmentsSingleSubject (1 of 1) # pattern: [7] GTAGTTCTCCA # subject: [1] GTAGTTCTCCA # score: 21.79932 end(pattern(aln1)) # [1] 17 mismatchTable(aln1) # PatternId PatternStart PatternEnd PatternSubstring PatternQuality #1 1 18 18 G 7 # SubjectStart SubjectEnd SubjectSubstring SubjectQuality #1 12 12 A 7 #### the one mismatch that's reported is after the end of the alignment as reported above. There's another mismatch after the end of the alignment that wasn't reported sessionInfo() R Under development (unstable) (2012-10-03 r60868) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.27.2 IRanges_1.17.0 BiocGenerics_0.5.0 loaded via a namespace (and not attached): [1] parallel_2.16.0 stats4_2.16.0
Alignment Alignment • 801 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 7 hours ago
Seattle, WA, United States
Hi Janet, Thanks again for the bug report. This one should be fixed in Biostrings 2.26.2 (release) and 2.27.3 (devel). Cheers, H. On 10/10/2012 05:13 PM, Janet Young wrote: > Hi there, > > I think I've found a bug in mismatchTable (Biostrings). It's reporting a mismatch after the end of the reported alignment. I think the code below shows the problem. > > thanks, as usual! > > Janet > > ##### > > library(Biostrings) > > ### couple of seqs, the middle portion aligns, but the last few bases don't. I'm not interested in those last few bases, so I do a local alignment > seq1 <- DNAString("GCTGAAGTAGTTCTCCAGAA") > seq2 <- DNAString("GTAGTTCTCCAAAGT") > aln1 <- pairwiseAlignment ( seq1, seq2, type="local" ) > aln1 > # Local PairwiseAlignmentsSingleSubject (1 of 1) > # pattern: [7] GTAGTTCTCCA > # subject: [1] GTAGTTCTCCA > # score: 21.79932 > > end(pattern(aln1)) > # [1] 17 > > mismatchTable(aln1) > # PatternId PatternStart PatternEnd PatternSubstring PatternQuality > #1 1 18 18 G 7 > # SubjectStart SubjectEnd SubjectSubstring SubjectQuality > #1 12 12 A 7 > #### the one mismatch that's reported is after the end of the alignment as reported above. There's another mismatch after the end of the alignment that wasn't reported > > sessionInfo() > > R Under development (unstable) (2012-10-03 r60868) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.27.2 IRanges_1.17.0 BiocGenerics_0.5.0 > > loaded via a namespace (and not attached): > [1] parallel_2.16.0 stats4_2.16.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6