So I am using the DMRcate package to find deferentially methylated regions. Using the dmrcate() function requires a value for C, scaling factor for bandwidth. It states that for 450k data when lambda = 1000 near-optimal prediction of sequencing-derived DMRs is obtained when 'C' is approximately 2. I was wondering if this would be the same for the EPIC array data??
I haven't done any empirical testing on the distribution of EPIC CpGs but I'd wager the optimum is about the same. The real issue is when you're doing sequencing assays, and fitting genomically consecutive CpGs, that you have to make C a lot larger, which makes the kernel smaller.
If you're concerned, I'd err on the side of making Ca bit bigger, say 3 or 4, since there are more probes on EPIC than 450K. If C is too small, and the kernel is to big, it can rope in nearby CpGs that aren't all that DM, and the DMR endpoints won't be as "precise". The tradeoff, of course, is that inflating C may atomise the DMRs too much, if you're looking for bigger DMRs in the order of kilobase or tens of kilobase, and you prefer these collapsed. But then you can just make lambda bigger if this is a bother!