Hi, I have mostly a conceptual question and I wanted to know if limma was the right bioconductor software to use for my study questions. I have paired data (only two timepoints) of array-based DNA methylation data. Every subject has both timepoints and there is no missing data. A toxicant exposure was measured at each timepoint. We have three hypotheses we're interested in testing. Model #1 we are interested in is the impact of overall toxicant (averaged between the two timepoints - "toxi") on DNA methylation change between the two timepoints. Model #2 we are interested in the impact of change in toxicant (difference between the two timepoints - "toxij") on DNA methylation change between the two timepoints. Model #3 we are interested in is the impact of overall toxicant (averaged between the two timepoints - toxi) on average DNA methylation that is persistent across the timepoints. I want to ensure I'm setting up my model matrices correctly.
#Model 1
design<-model.matrix(~toxi+time+toxi:time,data=pheno)
fit <- lmFit(getM(GRset),design,correlation=corfit,block=ID)
fit2 <- eBayes(fit)
fitfinal <- topTable(fit2, coef="toxi:time")
#Model 2
design<- model.matrix(~toxij+time+toxij:time)
fit <- lmFit(getM(GRset),design,correlation=corfit,block=ID)
fit2 <- eBayes(fit)
fitfinal <- topTable(fit2, coef="toxij:time")
#Model 3
design<-model.matrix(~toxi + time,data=pheno)
fit <- lmFit(getM(GRset),design,correlation=corfit,block=ID)
fit2 <- eBayes(fit)
fitfinal <- topTable(fit2, coef="toxi")
Yes, hypothesis testing is formulated separately. I apologize for the discrepancy - I am not applying three separate models to the same dataset or even to any datasets. This is more to conceptually understand if the model matrix is being fitted correctly via the code based on the time variant or invariant nature of the exposure and outcome, and that the subject specific effects are being accounted for.