trouble reading DNA stringset from keggGet function

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.5 years ago

I am having some difficulty making fasta files out of files returned by the keggGet function in the KEGGREST package. The file returned is apparently a DNA string set, but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so much for any help! -- output of sessionInfo(): > genes<-keggLink("ath00906") > head(genes) [,1] [,2] [,3] [1,] "path:ath00906" "ath:AT1G06820" "reverse" [2,] "path:ath00906" "ath:AT1G08550" "reverse" [3,] "path:ath00906" "ath:AT1G10830" "reverse" [4,] "path:ath00906" "ath:AT1G30100" "reverse" [5,] "path:ath00906" "ath:AT1G31800" "reverse" [6,] "path:ath00906" "ath:AT1G52340" "reverse" > sequences<-keggGet(genes[1:10,2],"ntseq") > head(sequences) A DNAStringSet instance of length 6 width seq names [1] 1788 ATGGATTTGTGTTTTC...AGGACACTCGCATAG ath:AT1G06820 CRT... [2] 1389 ATGGCAGTAGCTACAC...AGGAAGGTCAGGTAG ath:AT1G08550 NPQ... [3] 858 ATGGCGGTTTATCATC...ATTGGATTTTTATGA ath:AT1G10830 Z-I... [4] 1770 ATGGCTTGTTCTTACA...TTAAACCAGGCTTAA ath:AT1G30100 NCE... [5] 1788 ATGGCTATGGCCTTTC...TCTGCTCTTTCTTAA ath:AT1G31800 CYP... [6] 858 ATGTCAACGAACACTG...AAAGTCTTCAGATGA ath:AT1G52340 ABA... > readDNAStringSet(sequences,"fasta") Error in .normargInputFilepath(filepath) : 'filepath' must be a character vector with no NAs > class(sequences) #confirm that the input is a DNA string set [1] "DNAStringSet" attr(,"package") [1] "Biostrings" -- Sent via the guest posting facility at bioconductor.org.

PROcess KEGGREST PROcess KEGGREST • 2.1k views

ADD COMMENT • link updated 11.5 years ago by James W. MacDonald 68k • written 11.5 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Elliot, > library("KEGGREST")genes<-keggLink("ath00906") > genes<-keggLink("ath00906") > sequences<-keggGet(genes[1:10,2],"ntseq") > writeXStringSet(sequences, "./tmp.fasta") > scan("tmp.fasta", "c", nlines=2, sep = "\t") ## check it Read 2 items [1] ">ath:AT1G06820 CRTISO; carotenoid isomerase; K09835 prolycopene isomerase [EC:5.2.1.13] (N)" [2] "ATGGATTTGTGTTTTCAAAATCCCGTAAAGTGTGGTGATCGTTTGTTCTCCGCATTGAATACCTCTACG TATTACAAGCT" Best, Jim On Tuesday, September 10, 2013 1:55:20 PM, Elliot [guest] wrote: > > I am having some difficulty making fasta files out of files returned by the keggGet function in the KEGGREST package. The file returned is apparently a DNA string set, but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so much for any help! > > > > -- output of sessionInfo(): > >> genes<-keggLink("ath00906") > >> head(genes) > [,1] [,2] [,3] > [1,] "path:ath00906" "ath:AT1G06820" "reverse" > [2,] "path:ath00906" "ath:AT1G08550" "reverse" > [3,] "path:ath00906" "ath:AT1G10830" "reverse" > [4,] "path:ath00906" "ath:AT1G30100" "reverse" > [5,] "path:ath00906" "ath:AT1G31800" "reverse" > [6,] "path:ath00906" "ath:AT1G52340" "reverse" > >> sequences<-keggGet(genes[1:10,2],"ntseq") > >> head(sequences) > A DNAStringSet instance of length 6 > width seq names > [1] 1788 ATGGATTTGTGTTTTC...AGGACACTCGCATAG ath:AT1G06820 CRT... > [2] 1389 ATGGCAGTAGCTACAC...AGGAAGGTCAGGTAG ath:AT1G08550 NPQ... > [3] 858 ATGGCGGTTTATCATC...ATTGGATTTTTATGA ath:AT1G10830 Z-I... > [4] 1770 ATGGCTTGTTCTTACA...TTAAACCAGGCTTAA ath:AT1G30100 NCE... > [5] 1788 ATGGCTATGGCCTTTC...TCTGCTCTTTCTTAA ath:AT1G31800 CYP... > [6] 858 ATGTCAACGAACACTG...AAAGTCTTCAGATGA ath:AT1G52340 ABA... > >> readDNAStringSet(sequences,"fasta") > Error in .normargInputFilepath(filepath) : > 'filepath' must be a character vector with no NAs > >> class(sequences) #confirm that the input is a DNA string set > [1] "DNAStringSet" > attr(,"package") > [1] "Biostrings" > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.5 years ago James W. MacDonald 68k

Login before adding your answer.