I'm investigating ways to summarise methylation array data in a gene-centric way in order to perform downstream gene set/pathway enrichment analysis.
As each gene has many probes, it should be possible to apply an enrichment test to score differential gene methylation. So far I've been experimenting with two different approaches (1) GSEA like test using the limma t-statistics, and (2) fry test.
It looks as if the GSEA test is over-estimating the number of differentially methylated genes as a result of the high degree of correlation of probes belonging to a gene, and also the number of probes per gene strongly biases the results.
On the other hand the fry test results seem more in line with the expected "true" results. My question is, is it statistically correct to use fry() in this way? Secondly, would it be possible to perform GSEA, CAMERA or another gene set test downstream of fry in this way?
Any help is much appreciated.
Thanks for this Gordon, it is very helpful and greatly appreciated!