I have extracted a particular pattern position say TAT from a particular chromosome via BSgenome in R. Then I mapped my codon file with gtf file to check how many patterns are present in a particular gene. But what I found is, my codon number is crossing the length of gene. I am assuming why this happened is because
Lets assume that the sequence is "ATATATATGCAT" and its taking start and end position like this:
start end
2 4
4 6
6 8
can I avoid this? Here what I want is, if once ant position is read it won't go back to trace the pattern.
But in your case it seems that you want to be even more restrictive by comparing codons only i.e. by comparing TAT with the codons in coding sequence ATATATATGCAT. Assuming that the phase of the coding sequence is 0, you can extract the set of codons with the codons() function from the Biostrings package: