I am searching the genome for occurrences of a very small motif:
matches <- vmatchPattern(pattern = "TCAG", subject = BSgenome.Mmusculus.UCSC.mm10, exclude = c("M", "_"))
Because the motif is it's own reverse complement the vmatchPattern function returns matches on the positive and negative strand:
GRanges object with 13339240 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [3000191, 3000194] + [2] chr1 [3000813, 3000816] + [3] chr1 [3001048, 3001051] + [4] chr1 [3001119, 3001122] + [5] chr1 [3001795, 3001798] + ... ... ... ... [13339236] chrY [90843570, 90843573] - [13339237] chrY [90844087, 90844090] - [13339238] chrY [90844334, 90844337] - [13339239] chrY [90844496, 90844499] - [13339240] chrY [90844695, 90844698] -
I would like instead for it to return all matches on either the positive or negative strand. From there I can then just reset the strand information to ambiguous:
strand(matches) <- "*"
Is there an option within vmatchPattern to return matches from a single strand? Similar to matchPattern which returns the result with ambiguous strand information. Granted, I could just filter out all minus or plus strand ranges:
matches <- matches[strand(matches) == "+"]
I just wondered if there was a way within the vmatchPattern function itself which I was missing?