Wrong protein sequence fetched with R's Biostrings readDNAStringSet function
1
0
Entering edit mode
fastabest • 0
@fastabest-12193
Last seen 7.9 years ago

​There is somthing wrong with R package Biostrings function readDNAStringSet​. I am trying to read protein fasta sequence using this function

​fasta file http://mendel.imp.ac.at/PhyloDome/fastas.html

​library("Biostrings")

fa=readDNAStringSet("protein.fasta")​

head(fa,1)​

​    width seq                                                                                                names               
[1]   290 MRHAHTRCSRTSVAVMVSAHSCGGRGGRHRARNYVKTNSYTNSASGGV...AVNSSAHWGAMRSTAWAKHSSKVVSSANGHWYANAYKVKDYVSWRHD DROME_HH_Q02936

See the fetched fasta values and original fasta values DROME_HH_Q02936​

MRHIAHTQRCLSRLTSLVALLLIVLPMVFSPAHSCGPGRGLGRHRARNLY

​The sequence are different?

​Am I missing something

biostrings fasta protein • 2.5k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

It's a protein sequence, so use `readAAStringSet()`.

> Biostrings::readAAStringSet("tmp.fa")
  A AAStringSet instance of length 1
    width seq                                               names               
[1]   422 MRHIAHTPRGSCFMALLLLLLLA...LHWYANALYKVKDYVLPKSWRHD HH_DROHY_P56674

When read as DNAStringSet, you're missing the warning about invalid one-letter sequence codes

> Biostrings::readDNAStringSet("tmp.fa")
  A DNAStringSet instance of length 1
    width seq                                               names               
[1]   288 MRHAHTRGSCMAANRHAHSCGGR...TANGHWYANAYKVKDYVKSWRHD HH_DROHY_P56674
Warning message:
In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec,  :
  reading FASTA file tmp.fa: ignored 134 invalid one-letter sequence codes

 

ADD COMMENT
0
Entering edit mode

Thank you sir it worked

ADD REPLY

Login before adding your answer.

Traffic: 479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6