Biostrings readDNAMultipleAlignment broken for fasta input
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi there, I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R). There's an easy workaround I can use, but thought maybe you'd want to know anyway. The code below should show you what I mean. Thanks! Janet ---------------------------- library(Biostrings) ## make a test fasta-format alignment file mySeqs <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT", "AGTGA-GTGATCGGTAG-TGATGGTAGTT", "AGTGAGGTGATCGGTAGCTGATGCTAGTT", "---GAGGAGATCGGTAGCTGTTGCTAGTT") ) names(mySeqs) <- c("seq1","seq2","seq3","seq4") writeXStringSet( mySeqs, filepath="temp.fa") ### try reading it using readDNAMultipleAlignment myAln <- readDNAMultipleAlignment("temp.fa", format="fasta") # Error in XStringSet("DNA", x, start = start, end = end, width = width, : # error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) : # argument "seek.first.rec" is missing, with no default ### workaround: myAln2 <- readDNAStringSet("temp.fa", format="fasta") myAln2 <- DNAMultipleAlignment(myAln2) sessionInfo() R version 3.1.0 Patched (2014-05-26 r65771) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.33.10 XVector_0.5.6 IRanges_1.99.16 [4] S4Vectors_0.0.9 BiocGenerics_0.11.2 loaded via a namespace (and not attached): [1] stats4_3.1.0 zlibbioc_1.11.1
Alignment Biostrings Alignment Biostrings • 1.8k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Janet, It's funny that I received a bug report for this same issue off list from someone else just a few minutes before your post. Sounds like you guys are collaborating on the same project and running into the same bugs ;-) This is fixed in Biostrings 2.32.1 (release) and 2.33.12 (devel). Both won't become available thru biocLite() before Saturday morning though, but you can get them now from svn. Cheers, H. On 07/03/2014 05:26 PM, Janet Young wrote: > Hi there, > > I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R). There's an easy workaround I can use, but thought maybe you'd want to know anyway. The code below should show you what I mean. > > Thanks! > > Janet > > ---------------------------- > > library(Biostrings) > > ## make a test fasta-format alignment file > mySeqs <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT", > "AGTGA-GTGATCGGTAG-TGATGGTAGTT", > "AGTGAGGTGATCGGTAGCTGATGCTAGTT", > "---GAGGAGATCGGTAGCTGTTGCTAGTT") ) > names(mySeqs) <- c("seq1","seq2","seq3","seq4") > writeXStringSet( mySeqs, filepath="temp.fa") > > ### try reading it using readDNAMultipleAlignment > myAln <- readDNAMultipleAlignment("temp.fa", format="fasta") > # Error in XStringSet("DNA", x, start = start, end = end, width = width, : > # error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) : > # argument "seek.first.rec" is missing, with no default > > > ### workaround: > myAln2 <- readDNAStringSet("temp.fa", format="fasta") > myAln2 <- DNAMultipleAlignment(myAln2) > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biostrings_2.33.10 XVector_0.5.6 IRanges_1.99.16 > [4] S4Vectors_0.0.9 BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 zlibbioc_1.11.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6