Entering edit mode
Given that vmatchPattern
doesn't work with a vector as the pattern and vmatchPDict
isn't implemented, are there alternatives in R that don't involve using short read mapping algorithms and building indexes?
Thanks for the suggestion. However,
AhoCorasickSearch
currently doesn't support mismatches nor indels. I doubt that it would be useful for many genomics applications. I notice that my question is basically the same as matching of AAStringSet vs. another AAStringSet. It might be a common use case worth an optimised solution inBiostrings
.Good to know about the limitations. Another possibility is to 'unlist' one of the StringSets into in to a single *String separated by nonsense (e.g., poly-N), match, then relist the result as appropriate.