Hi,
I´m running some stats on fastq files as shown below
fq <- readFastq(file) # get quality score list as PhredQuality qual <- as(quality(fq), "matrix") #Probability error per base pe <- apply(qual,2, function(x){10^(-(x/10))}) # P=10^(Qscore/10) #Cumulated error per read. The probabilty error is increased towards the end of the sequence cum_pe <- apply(pe,1,cumsum) cum_pe[,1] [1] 5.011872e-04 8.992944e-04 1.297402e-03 1.695509e-03 2.093616e-03 2.491723e-03 [7] 3.286051e-03 3.485578e-03 3.685104e-03 4.083211e-03 4.334400e-03 4.533926e-03 [13] 8.514998e-03 9.016185e-03 9.414292e-03 1.442616e-02 1.642143e-02 1.681953e-02 ..... [295] 1.226770e+02 1.233080e+02 1.239390e+02 1.245699e+02 1.252009e+02 1.258318e+02 > dim(cum_pe) [1] 300 56761 So that i have a matrix of 56761 reads of 300nt length each.
How can i trim each read end when the cum_pe gets > 1.0 . For example if the cum_pe is 1 at position 150 of the read i would like to trim at position 150.
Thanks,
The calculation of pe in the original question can be vectorized
I might write the pos-finding function as one of
(Mike's fails if none of x > 1). One problem might be that early low-quality bases lead to premature termination
I think mine will catch instances where none of > 1 since it'll return
min(Inf, length(x))
in that case, but yours is a much more elegant solution.