fasta biostrings bioconductor
3
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
I posted this same quandary on Biostars and stack overflow. I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: >GeneNameOne CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >GeneNameTwo CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC ...etc I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. Does anyone have an idea on what is going on? Thanks in advance. -- output of sessionInfo(): Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table -- Sent via the guest posting facility at bioconductor.org.
• 4.2k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi there, I guess you're trying to use DNAStringSet() on a file name that contains a "p", which of course is not going to work (and even if it worked, it wouldn't do what you're trying to do). To read a FASTA file, use readDNAStringSet(), not the DNAStringSet constructor function. Cheers, H. On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. you could try a divide-and-conquer approach, splitting the file into two and read each and choose the half with a problem and continue. Please continue reading below... > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table Rather than repeating the error without context, it is usually helpful to cut-and-paste the relevant portions of the session that causes problems, e.g., > library(Biostrings) > readLines("FileName.fa", 4) ## correct file? [1] "> GeneNameOne" [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" [3] "> GeneNameTwo" [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > readDNAStringSet("FileName.fa") Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table The information being asked for here is the output of the command sessionInfo() so that basic information about your system is available; here's mine, > library(Biostrings) > sessionInfo() R version 3.0.2 Patched (2014-01-02 r64626) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] stats4_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hello guest with no name, Have you tried something simple? library(ShortRead) mysequences <- readFasta('FileName.fa') Cheers, Ivan Ivan Gregoretti, PhD Bioinformatics On Fri, Mar 28, 2014 at 12:56 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > >> >> I posted this same quandary on Biostars and stack overflow. >> >> I am attempting to import a fasta file of sequences into R using >> Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I >> keep getting the same error: >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> >> My fasta file ("FileName.fa") is comprised of various length sequences, >> in the following format: >> >> GeneNameOne >>> >> CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> >>> GeneNameTwo >>> >> CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >> ...etc >> >> I performed 'grep p FileName.fa' in the Unix terminal, but I received no >> output. >> > > you could try a divide-and-conquer approach, splitting the file into two > and read each and choose the half with a problem and continue. Please > continue reading below... > > > >> Does anyone have an idea on what is going on? >> >> Thanks in advance. >> >> -- output of sessionInfo(): >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> > > Rather than repeating the error without context, it is usually helpful to > cut-and-paste the relevant portions of the session that causes problems, > e.g., > > > library(Biostrings) > > readLines("FileName.fa", 4) ## correct file? > [1] "> GeneNameOne" > [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" > [3] "> GeneNameTwo" > [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > > readDNAStringSet("FileName.fa") > > Error in .Call2("new_XString_from_CHARACTER", classname, x, > start(solved_SEW), : key 112 (char 'p') not in lookup table > > The information being asked for here is the output of the command > sessionInfo() so that basic information about your system is available; > here's mine, > > > library(Biostrings) > > sessionInfo() > R version 3.0.2 Patched (2014-01-02 r64626) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 > BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] stats4_3.0.2 > > > > >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Malcolm Cook ★ 1.6k
@malcolm-cook-6293
Last seen 4 months ago
United States
Hi Just a thought.... Did you run the grep with -i option for case insensitivity? If you should find a "P" then look again and see if you have any control-As in that file. If you do, then, I'm guessing that file came from NCBI. If it did, then, know this: NCBI uses control-A to separate multi- line deflines in fasta files. That's all I got, Malcolm Cook >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of DNAStringSet Error Biostrings >in R [guest] >Sent: Friday, March 28, 2014 11:43 AM >To: bioconductor at r-project.org; ttatanas at ucsd.edu >Subject: [BioC] fasta biostrings bioconductor > > >I posted this same quandary on Biostars and stack overflow. > >I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but >I keep getting the same error: > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >>GeneNameOne >CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >>GeneNameTwo >CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >...etc > >I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > >Does anyone have an idea on what is going on? > >Thanks in advance. > > -- output of sessionInfo(): > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >-- >Sent via the guest posting facility at bioconductor.org. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6