Entering edit mode
Lucas Carey
▴
40
@lucas-carey-3898
Last seen 5.9 years ago
Hi All,
I'm wondering what is the best way to get the score for every match
from matchPWM() in Biostrings
Right now, to score all matches to pwm in genome I do this:
#Find PWM hits for fwd & reverse complement of PWM for all chromosomes
in genome
mmf <- sapply(1:Nchr,
function(chr){matchPWM(pwm,genome[[chr]],min.score=cutoff) } )
mmr <- sapply(1:Nchr,
function(chr){matchPWM(reverseComplement(pwm),genome[[chr]],min.score=
cutoff)
} )
mmm <- c(mmf,mmr)
#Extract the sequences. RevComp where necessary.
Sequences <- c( rapply(mmf,as.character,how='unlist'),
sapply(rapply(mmr,as.character,how='unlist'),function(x){c2s(rev(comp(
s2c(x))))})
)
#convert to DNAStringSet for in order to score. This is quite slow
lcl_set <- DNAStringSet(as.character(Sequences))
Scores <- sapply(lcl_set,PWMscoreStartingAt,pwm=pwm)
This is incredibly inefficient. What is the best way to do this?
thanks
-Lucas