Using letterFrequencyInSlidingView with an AAStringSet
1
0
Entering edit mode
joselynnw • 0
@joselynnw-14596
Last seen 7.0 years ago

Hello,

I am trying to scan a translated transcriptome assembly for some conserved repeat regions that are characterized by having a high percentage of certain amino acids. I'd like to scan the whole dataset and then filter the sequences based on a threshold amino acid content per 100 AA sliding window. I'm having some difficulty in figuring out how to use the letterFrequencyInSlidingView functionality on an AAStringSet data type (maybe this is just not possible?)

This works fine:

letters=c("P","C","S","D")

seq<-AAString("QPSDLNPSSQPSECADVLEECPIDECFLPYSDASRPPSCLSFGRPDCDVLPTPQNINCPRCCATECRPDNPMFTPSPDGSPPICSPTMLPTNQPTPPEPSSAPSDCGEVIEECPLDTCFLPTSDPARPPDCTAVGRPDCDVLPFPNNLGCPACCPFECSPDNPMFTPSPDGSPPNCSPTMLPTPQPSTPTVITSPAPSSQPSQCAEVIEQCPIDECFLPYGDSSRPLDCTDPAVNRPDCDVLPTPQNINCPACCAFECRPDNPMFTPSPDGSPPICSPTMMPSPEPSSQPSDCGEVIEECPIDACFLPKSDSARPPDCTAVGRPDCNVLPFPNNIGCPSCCPFECSPDNPMFTPSPDGSPPNCSPTMLPSPSPSAVTVPLTPAPSSAPTRQPSSQPTGPQPSSQPSECADVLELCPYDTCFLPFDDSSRPPDCTDPSVNRPDCDKLSTAIDFTCPTCCPTQCRPDNPMFSPSPDGSPPVCSPTMMPSPLPSPTE")

seq_letters_freq<-letterFrequencyInSlidingView(seq, 100, letters, as.prob=TRUE)

But when I try this:

multiple_seqs<-readAAStringSet('multiple_seqs.fasta')

multiple_seqs_letters_freq<-letterFrequencyInSlidingView(multiple_seqs, 100, letters, as.prob=TRUE)

I get this error message:

Error in (function (classes, fdef, mtable)  : 

  unable to find an inherited method for function ‘letterFrequencyInSlidingView’ for signature ‘"AAStringSet"’

Is there some way to loop over each AAString in the AAStringSet? 

edit:

I am able to get unlist(x) to help somewhat

multiple_seqs_unlisted<-unlist(multiple_seqs)
multiple_seqs_letters_freq<-letterFrequencyInSlidingView(multiple_seqs_unlisted, 100, letters, as.prob=TRUE)

I'm still having an issue with the output table because it has no information about which sequences (from the multi sequence fasta) contain each of the sliding window areas that pass the thresholds after I filter (just numbers, which I assume correspond to the sliding window position -- but I'm not sure how to make that information useful). 

biostrings aastringset letterFrequencyInSlidingView xstringset • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

Have you tried lapply?

mslf <- lapply(multiple_seqs, letterFrequencyInSlidingView, view.width = 100, letters = letters, as.prob = TRUE)
ADD COMMENT

Login before adding your answer.

Traffic: 360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6