Bug in rtracklayer::import of amino acid fasta sequences
1
0
Entering edit mode
@kvittingseerup-7956
Last seen 15 months ago
European Union

I just found a problem with rtracklayer's import function when importing amino acid fasta (also the default way of storing amino acid sequences. (E.g. the "Protein-coding transcript translation sequences" at Gencode genes official site)).

Here is an easy to reproduce example:

### Define local output path (here i use ~/Downloads as a tmp dir)
outputPath <- '~/Downloads/'

### Make example data
aaExample <- 'MAAWRAAGLNYINYSNIAARLLRKALKPELRAQAVRRDDSHIKFTKWQNGKPEKAITE'
aaExample <- AAStringSet(aaExample)
names(aaExample) <- 'test'

### Write it to fasta file
writeXStringSet(
    aaExample,
    filepath = paste0(outputPath, '/aa.fasta')
)

### Verify fasta file
rawInput <- readLines(paste0(outputPath, '/aa.fasta'))
nchar(rawInput[[2]]) == nchar(aaExample)

Which is TRUE whereby we verified the produced fasta file is correct. Now let's import it with rtracklayer

imporAA <- rtracklayer::import(paste0(outputPath, '/aa.fasta'))

width(imporAA)

The resulting with is 41 and the import throws the following warning:

Warning message:
    In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec,  :
              reading FASTA file ~/Downloads//aa.fasta: ignored 17 invalid one-letter sequence codes

Hope this can be fixed as it would make it much easier to work with AA sequences.

Cheers Kristoffer

rtracklayer fasta import Biostrings • 1.6k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

You have to tell rtracklayer the type of sequence using the type= argument ("AA" in this case).

ADD COMMENT
0
Entering edit mode

Is there in practice any difference between that and using Biostrings::readAAStringSet() ? Would you recomend using Biostrings::readDNAStringSet() or rtracklayer::import() for DNA fasta files?

ADD REPLY
1
Entering edit mode

It's just there for completeness (rtracklayer also provides 2bit import) and is a simple wrapper around Biostrings, which is probably the better way to go.

ADD REPLY

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6