Hello,
I'm working with some DNA methylation data, and I'd love to use Jaffe et al.'s bumphunter algorithm as implemented in the minfi package to identify differentially methylated regions. However, this study's design is somewhat complex and I could use some guidance on how to properly format the design matrix. Specifically, we're using control and treatment samples taken at two time points each. In addition to comparing each condition's time points and each time point's conditions, we'll be comparing the interactions between the two pairs of factors (i.e., the differences between different times and the differences between different conditions-at-a-time). To recap: this is basically a straightforward 2 x 2 factorial design with interaction terms included, making for six total comparisons.
These sorts of studies are well covered in §9.5 of the limma user guide: https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf. Since each subject appears twice, however, we'll need to account for inter-subject correlation as advised in §9.7 of the aforementioned guide. These tips combined would suffice to find the most differentially methylated positions - but we're interested in DMRs, not DMPs. What I'd love to do, ideally, is input the topTable output to bumphunter so that it can search for DMRs there. But I don't see any way to do anything like that within minfi or the original bumphunter package. A quick dive into the source code for the latter hasn't proven especially fruitful, although I could probably figure something out if I tinker away for long enough. Has anyone else faced this issue before? Any tips on how to proceed?
Here's a basic review of the issue in code form. Treatment and control are coded T/C, while 0h and 72h time points are coded 0/1. Data are M-values (MVals
), which have been fully preprocessed (background corrected with noob, normalized with BMIQ, and batch corrected with ComBat).
#This study requires something like the following
library(limma)
Treat <- factor(paste(targets$Condition, targets$Time, sep='.'))
design <- model.matrix(~0 + Treat)
colnames(design) <- levels(Treat)
corfit <- duplicateCorrelation(MVals, design, block=targets$Subject)
fit <- lmFit(MVals, design, block=targets$Subject, correlation=corfit$consensus)
cm <- makeContrasts(TvsCin0h=T.0-C.0, TvsCin72h=T.1-C.1, 0hvs72hinT=T.0-T.1, 0hvs72hinC=C.0-C.1, Condition=(T.0-C.0)-(T.1-C.1), Time=(T.0-T.1)-(C.0-C.1), levels=design)
fit2 <- constrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
topTable(fit2, coef='TvsCin0h')
#That last line is just for the first comparison, but you get the idea.
#bumphunter requires a GenomicRatioSet and a design matrix among its arguments, but there's no way that I can tell to account for duplicate correlations or complex interactions
DMRs <- bumphunter(GRSet, design, cutoff=.2, smooth=T, B=1000, what='M')
Since the function generates a list of t-tests as a first step anyway, perhaps it would be possible to just insert the topTable output above at some intermediary stage in the algorithm? Or else maybe there's some way I don't know about to format the model matrix so it accounts for inter-subject correlation and variable interaction? Any help would be much appreciated. Thanks!
Cheers,
David
Hi David - I'm looking to apply bumphunter in a similar way to what you have outlined. Have you made any progress?
Thanks,
Magda
Hi David and Magda,
Have either of you made any progress? I too am looking to use bumphunter in a similar way (for repeated measures).
Thanks,
Ash
Hi guys,
I am also interested in this type of analysis. Have any of you made any progress?
Many thanks!