Hi,
If I have two DNAStrings in a DNAStringSet, they are of the same length:
> dna <- DNAStringSet(c("ATYGRTCGATCG", "MTSGATCRATCG"))
> dna
A DNAStringSet instance of length 2
width seq
[1] 12 ATYGRTCGATCG
[2] 12 MTSGATCRATCG
How can I count the number of mismatches between the two sequences, ideally at each base, assuming that the ambiguity codes denote heterozygosity. So for example at the first base, A means homozygous A/A, and M means heterozygous A/C, and therefore counts as 1 mismatch. Another example, at the third position, Y = heterozygous C/T, whereas S = heterozygous C/G, and again is one mismatch.
I've thought of a few ways of doing this using consensusMatrices and such and thought about defining some custom scoring matrix which when you index using the bases, spits out the appropriate score, so mat["A", "A"] = 0, mat["A", "M"] = 1 and so on.
I wanted to ask the board if this (or a similar task) is a task that has been done with Biocondcuctor before and what the best way of doing it might be.
Many thanks,
Ben W.