I'm performing an analysis of MRE-Seq (methylation-sensitive restriction enzyme digest + sequencing) data and would like to perform a Pileup count of the first 3 nucleotides (left or 5') of each mapped fastq read. The fastq reads all start with 'CGG', as this is the site where the enzyme(s) cut. By performing a pileup of the first 3 nucleotides, we can get a measure of methylation at this site.
For the reads mapped to the forward strand (+), this is relatively simple as described here C: Comparing read sequence back to reference sequence.
p_param <- PileupParam(cycle_bins=seq(0, 3))
sbp <- ScanBamParam(flag=scanBamFlag(isMinusStrand=FALSE))
res <- pileup(file=fl, pileupParam=p_param, scanBamParam=sbp)
But for reads mapping to the reverse (-) strand, the 'start' of alignment in the bam file is the 3' (right) end of the fastq record (and the reads are of variable length. Therefore I need to perform a pileup of the last 3 nucleotides of each read aligning to the reverse strand, such as:
p_param <- PileupParam(cycle_bins=seq(Inf-3, Inf))
sbp <- ScanBamParam(flag=scanBamFlag(isMinusStrand=TRUE)) res <- pileup(file=fl, pileupParam=p_param, scanBamParam=sbp)
I'm aware that the Inf-3 will not work, but is there a way to pileup at end of the alignment for reads mapping to the reverse strand?
Hi Sam,
In case you're after the coverage of the first 3 nucleotides of your FASTQ reads (i.e. counting nucleotides from the left with respect to the FASTQ file), you could use a combination of
qnarrow()
andcoverage()
:Note that the difference between
qnarrow()
andnarrow()
is that the latter trims the reads after alignment. This could make a difference if, say, some alignments have indels within the first 3 nucleotides (or last 3 if on the minus strand). If your alignments are simple (i.e. your CIGARs contain only M's), thenqnarrow()
andnarrow()
will produce exactly the same result.Then:
The result is a named RleList object with 1 coverage vector per chromosome.
H.