Entering edit mode
Janet Young
▴
740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi there,
I found a broken function in Biostrings (I think) -
readDNAMultipleAlignment doesn't work to read in fasta input files (my
preferred sequence format for a lot of stuff outside of R). There's
an easy workaround I can use, but thought maybe you'd want to know
anyway. The code below should show you what I mean.
Thanks!
Janet
----------------------------
library(Biostrings)
## make a test fasta-format alignment file
mySeqs <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT",
"AGTGA-GTGATCGGTAG-TGATGGTAGTT",
"AGTGAGGTGATCGGTAGCTGATGCTAGTT",
"---GAGGAGATCGGTAGCTGTTGCTAGTT") )
names(mySeqs) <- c("seq1","seq2","seq3","seq4")
writeXStringSet( mySeqs, filepath="temp.fa")
### try reading it using readDNAMultipleAlignment
myAln <- readDNAMultipleAlignment("temp.fa", format="fasta")
# Error in XStringSet("DNA", x, start = start, end = end, width =
width, :
# error in evaluating the argument 'x' in selecting a method for
function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) :
# argument "seek.first.rec" is missing, with no default
### workaround:
myAln2 <- readDNAStringSet("temp.fa", format="fasta")
myAln2 <- DNAMultipleAlignment(myAln2)
sessionInfo()
R version 3.1.0 Patched (2014-05-26 r65771)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] Biostrings_2.33.10 XVector_0.5.6 IRanges_1.99.16
[4] S4Vectors_0.0.9 BiocGenerics_0.11.2
loaded via a namespace (and not attached):
[1] stats4_3.1.0 zlibbioc_1.11.1