Entering edit mode
Vinicius Henrique da Silva
▴
40
@vinicius-henrique-da-silva-6713
Last seen 19 months ago
Brazil
I would like to identify the regions with repeated patterns in a given genome. Let's say that I need to identify [TA]n regions, were 'n' is a variable number of repeats.
I thought in a loop to resolve the problem, however, it will take a long time and will produce redundant regions. Thus, I would like to know if there is a efficient way to analyze that.
library("Biostrings") G = readDNAStringSet("any.fa") seqAll <- seq(from =1 , to =1000, by=1) ali <- NULL for(k in 1:length(seqAll)){ nx <- seqAll[k] patx <- paste(rep("AT",nx), sep="", collapse="") ali[k] <- vmatchPattern(DNAString(patx), G, max.mismatch=0) }