I have DNA methylation data from Illumina EPIC arrays for 10 patients and 6 controls, with 3 different tissues per individual (run in 3 different days). I have normalised data (BMIQ, champ.norm()) and removed batch affect (champ.runCombat()) following ChAMP pipeline. Two tissues are affected in the disease I am studying, and the 3rd is almost unaffected. The design is quite complex, so I need to perform multilevel analysis, both within and between subjects. To complicate things there is a difference in age and PMD between patients and controls, so I would like to correct for these covariates (continuous).
The questions I want to answer are:
- Which are the differentially methylated probes (DMPs) between patients and controls for each tissue?
- Within the patients group, which are the DMPs between the affected tissues versus the less affected or unaffected tissues?
- Which are the DMPs between the affected tissues versus the unaffected tissues that are specific to the disease and not the tissue?
I have read the limma user guide which I found was very helpful. As my study design is complex and I am very new to DNA methylation analysis, I would like to be sure my analyses are correct.
If I just wanted to answer question 1 (between subjects analysis, correcting for age and PMD), could I use the following approach?
Approach 1
f <- factor(myLoad$pd$Group_Region)
PMD <- as.numeric(myLoad$pd$PMD)
Age <- as.numeric(myLoad$pd$Age)
design <- model.matrix(~0+f+Age+PMD)
fit <- lmFit(mval,design)
cm <- makeContrasts(
MSAvsCTRL_OCTX = fMSA_OCTX-fCTRL_OCTX,
MSAvsCTRL_FCTX = fMSA_FCTX-fCTRL_FCTX,
MSAvsCTRL_CRBL = fMSA_CRBL-fCTRL_CRBL,
levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
To answer the other questions I need to use multi-level analysis. I am unsure though if I can include the covariates or not. I have tried design 1 and design 2 (approaches 2 and 3) below, and design 2 (approach 3) would give me DMPs numbers similar to my approach 1 for the contrasts between patients and controls only. Could you let me know if my approach 3 is correct please?
Approach 2
regions <- factor(paste(myLoad$pd$Group,myLoad$pd$Region,sep="."))
design1 <- model.matrix(~0+regions)
colnames(design1) <- levels(regions)
corfit <- duplicateCorrelation(mval,design1,block=myLoad$pd$ID)
corfit$consensus
fit <- lmFit(mval,design1,block=myLoad$pd$ID,correlation=corfit$consensus)
cm <- makeContrasts(
MSAvsCTRL_OCTX = MSA.OCTX-CTRL.OCTX,
MSAvsCTRL_FCTX = MSA.FCTX-CTRL.FCTX,
MSAvsCTRL_CRBL = MSA.CRBL-CTRL.CRBL,
MSA_FCTXvsOCTX = MSA.FCTX-MSA.OCTX,
MSA_CRBLvsOCTX = MSA.CRBL-MSA.OCTX,
MSA_CRBLvsFCTX = MSA.CRBL-MSA.FCTX,
MSA_FvsO_CRTL_FvsO = (MSA.FCTX-MSA.OCTX)-(CTRL.FCTX-CTRL.OCTX),
MSA_CvsO_CRTL_CvsO = (MSA.CRBL-MSA.OCTX)-(CTRL.CRBL-CTRL.OCTX),
MSA_CvsF_CRTL_CvsF = (MSA.CRBL-MSA.FCTX)-(CTRL.CRBL-CTRL.FCTX),
levels=design1)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
Approach 3
regions <- factor(paste(myLoad$pd$Group,myLoad$pd$Region,sep="."))
design2 <- model.matrix(~0+regions+Age+PMD, myLoad$pd)
colnames(design2) <- c("CTRL.CRBL", "CTRL.FCTX", "CTRL.OCTX", "MSA.CRBL", "MSA.FCTX", "MSA.OCTX", "Age", "PMD")
corfit <- duplicateCorrelation(mval,design2,block=myLoad$pd$ID)
corfit$consensus
fit <- lmFit(mval,design2,block=myLoad$pd$ID,correlation=corfit$consensus)
cm <- makeContrasts(
MSAvsCTRL_OCTX = MSA.OCTX-CTRL.OCTX,
MSAvsCTRL_FCTX = MSA.FCTX-CTRL.FCTX,
MSAvsCTRL_CRBL = MSA.CRBL-CTRL.CRBL,
MSA_FCTXvsOCTX = MSA.FCTX-MSA.OCTX,
MSA_CRBLvsOCTX = MSA.CRBL-MSA.OCTX,
MSA_CRBLvsFCTX = MSA.CRBL-MSA.FCTX,
MSA_FvsO_CRTL_FvsO = (MSA.FCTX-MSA.OCTX)-(CTRL.FCTX-CTRL.OCTX),
MSA_CvsO_CRTL_CvsO = (MSA.CRBL-MSA.OCTX)-(CTRL.CRBL-CTRL.OCTX),
MSA_CvsF_CRTL_CvsF = (MSA.CRBL-MSA.FCTX)-(CTRL.CRBL-CTRL.FCTX),
levels=design2)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
Also, do you think my last 3 contrasts are answering my question 3?
I would be very grateful if someone could give me some reassurance or some suggestions to correct and/or improve my analyses.
Many thanks!!!!
You need to give more information about your experimental design. What's "affected", "less affected", "unaffected"? What is "MSA"? What is "FCTX", "CRBL", "OCTX"? You mention a disease - what is it? You should also provide the smallest subset of your sample table that captures the complexity of your experiment; this will make it much easier to communicate.
Many thanks Aaron for getting back to me.
MSA (multiple system atrophy) is the disease group (CTRL are the controls), and there are 3 brain tissues per individual (CRBL, FCTX, and OCTX), not equally affected in this disease. OCTX is unaffected, FCTX is mildly affected and CRBL is largely affected in MSA, so in my contrasts within MSA I always compare a more severely affected with a less affected tissue.
Please see below a subset of my sample sheet.
I hope this answers your questions, if not please let me know.
Many thanks!!!