Question

Identify in tandem repeats with Bioconductor

0

Entering edit mode

Vinicius Henrique da Silva ▴ 40

@vinicius-henrique-da-silva-6713

Last seen 19 months ago

Brazil

I would like to identify the regions with repeated patterns in a given genome. Let's say that I need to identify [TA]n regions, were 'n' is a variable number of repeats.

I thought in a loop to resolve the problem, however, it will take a long time and will produce redundant regions. Thus, I would like to know if there is a efficient way to analyze that.

library("Biostrings")
G = readDNAStringSet("any.fa")

seqAll <- seq(from =1 , to =1000, by=1) 
ali <- NULL

for(k in 1:length(seqAll)){
nx <- seqAll[k]

patx <- paste(rep("AT",nx), sep="", collapse="")

ali[k] <- vmatchPattern(DNAString(patx), G, max.mismatch=0)
}

biostrings • 1.2k views

ADD COMMENT • link updated 8.3 years ago by Hervé Pagès 16k • written 8.3 years ago by Vinicius Henrique da Silva ▴ 40

score 0 · Answer 1 · 2016-09-23

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 7 days ago

Seattle, WA, United States

Hi Vinicius,

You might want to check this post for a more efficient approach:

A: Is there any package helps finding Tandem Repeats ?

Cheers,

H.

ADD COMMENT • link 8.3 years ago Hervé Pagès 16k