shortread quality

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.1 years ago

Hi, I'm reading a fastq file from the solexa sequencer. I would like to know how many reads have a phred score (>Q29). The thing is that i get the densities so i don't really know how many reads from the total pass that filter. It's probaly easy for you so any hint would be helpful library("ShortRead") fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" path = getwd() sp <- SolexaPath(path,dataPath=path,analysisPath=path) # Read fastq File and save report fq <- readFastq(sp, fqpattern) qaSummary <- qa(fq,fqpattern) save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) report(qaSummary,dest="report") #Quality idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , qaSummary[["readQualityScore"]][idx,"density"]) a #reads with a quality >Q29 #How to get the total number ? or percent compared to the total number of reads ? thanks

• 2.6k views

ADD COMMENT • link updated 12.9 years ago by Martin Morgan 25k • written 12.9 years ago by David ▴ 860

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 3 months ago

United States

Hi, On 06/06/2012 08:00 AM, David martin wrote: > Hi, > I'm reading a fastq file from the solexa sequencer. > I would like to know how many reads have a phred score (>Q29). The thing If you mean the average base quality score, then fq <- readFastq(sp, fqpattern) score <- alphabetScore(fq) gives the sum of the base quality scores for each read, so is a vector as long as the length of the reads. The average is aveScore <- score / width(fq) and then you're in the realm of familiar R again, e.g., hist(aveScore) table(aveScore > 29) etc. Hope that heps, Martin > is that i get the densities so i don't really know how many reads from > the total pass that filter. It's probaly easy for you so any hint would > be helpful > > library("ShortRead") > fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" > > path = getwd() > sp <- SolexaPath(path,dataPath=path,analysisPath=path) > > # Read fastq File and save report > fq <- readFastq(sp, fqpattern) > qaSummary <- qa(fq,fqpattern) > save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) > report(qaSummary,dest="report") > > #Quality > > idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) > a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , > qaSummary[["readQualityScore"]][idx,"density"]) > a #reads with a quality >Q29 > > #How to get the total number ? or percent compared to the total number > of reads ? > > thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 12.9 years ago Martin Morgan 25k

0

Entering edit mode

On 06/06/2012 12:11 PM, Martin Morgan wrote: > Hi, > > On 06/06/2012 08:00 AM, David martin wrote: >> Hi, >> I'm reading a fastq file from the solexa sequencer. >> I would like to know how many reads have a phred score (>Q29). The thing > > If you mean the average base quality score, then > > fq <- readFastq(sp, fqpattern) > score <- alphabetScore(fq) > > gives the sum of the base quality scores for each read, so is a vector > as long as the length of the reads. The average is > > aveScore <- score / width(fq) > > and then you're in the realm of familiar R again, e.g., > > hist(aveScore) > table(aveScore > 29) > > etc. > > Hope that heps, I guess the qa object already gets you further, as you've indicated df <- qaSummary[["readQualityScore"]] the 'density' column (apparently not really a density) could be turned into a cumulative density cdensity <- cumsum(df$density) / sum(df$density) and then look up the cumulative density nearest the quality that you're interested in cdensity[findInterval(29, df$quality)] You'd want to do these steps separately for each lane, if there were several in df. Martin > > Martin > > > >> is that i get the densities so i don't really know how many reads from >> the total pass that filter. It's probaly easy for you so any hint would >> be helpful >> >> library("ShortRead") >> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" >> >> path = getwd() >> sp <- SolexaPath(path,dataPath=path,analysisPath=path) >> >> # Read fastq File and save report >> fq <- readFastq(sp, fqpattern) >> qaSummary <- qa(fq,fqpattern) >> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) >> report(qaSummary,dest="report") >> >> #Quality >> >> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) >> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , >> qaSummary[["readQualityScore"]][idx,"density"]) >> a #reads with a quality >Q29 >> >> #How to get the total number ? or percent compared to the total number >> of reads ? >> >> thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 12.9 years ago Martin Morgan 25k

0

Entering edit mode

thanks Martin, That's exactly what i wanted to get !!! On 06/06/2012 10:34 PM, Martin Morgan wrote: > On 06/06/2012 12:11 PM, Martin Morgan wrote: >> Hi, >> >> On 06/06/2012 08:00 AM, David martin wrote: >>> Hi, >>> I'm reading a fastq file from the solexa sequencer. >>> I would like to know how many reads have a phred score (>Q29). The thing >> >> If you mean the average base quality score, then >> >> fq <- readFastq(sp, fqpattern) >> score <- alphabetScore(fq) >> >> gives the sum of the base quality scores for each read, so is a vector >> as long as the length of the reads. The average is >> >> aveScore <- score / width(fq) >> >> and then you're in the realm of familiar R again, e.g., >> >> hist(aveScore) >> table(aveScore > 29) >> >> etc. >> >> Hope that heps, > > I guess the qa object already gets you further, as you've indicated > > df <- qaSummary[["readQualityScore"]] > > the 'density' column (apparently not really a density) could be turned > into a cumulative density > > cdensity <- cumsum(df$density) / sum(df$density) > > and then look up the cumulative density nearest the quality that you're > interested in > > cdensity[findInterval(29, df$quality)] > > You'd want to do these steps separately for each lane, if there were > several in df. > > Martin > > > >> >> Martin >> >> >> >>> is that i get the densities so i don't really know how many reads from >>> the total pass that filter. It's probaly easy for you so any hint would >>> be helpful >>> >>> library("ShortRead") >>> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" >>> >>> path = getwd() >>> sp <- SolexaPath(path,dataPath=path,analysisPath=path) >>> >>> # Read fastq File and save report >>> fq <- readFastq(sp, fqpattern) >>> qaSummary <- qa(fq,fqpattern) >>> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) >>> report(qaSummary,dest="report") >>> >>> #Quality >>> >>> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) >>> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , >>> qaSummary[["readQualityScore"]][idx,"density"]) >>> a #reads with a quality >Q29 >>> >>> #How to get the total number ? or percent compared to the total number >>> of reads ? >>> >>> thanks >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >

ADD REPLY • link 12.9 years ago David ▴ 860

Login before adding your answer.