Dear all,
I am testing limma
for multiple-group analysis. I have followed the recommendations in the user guide creating the design object and to establish the contrasts. In general, I would like to be able to run an ANOVA-like test, to look for proteins that are affected in general by the grouping variable and maybe do some pair-wise comparisons afterwards.
I understand that the topTableF
function gives me this information by following the instruction in the user guide (page 43), but I am curious on why it is necessary to define the contrasts a priori after fitting the model if I want to check only for the general grouping effect. I am referring to this step in the analysis:
fit <- lmFit(mat, design = design)
fitcontrasts <- contrasts.fit(fit,constrasts)
Based on this doubt, I have tested different settings for creating the design
object and the fitting of the model with different results. I'll try to outline my tests here in the best possible way.
Experimental groups
groups <- c(rep("A",6), rep("B",5), rep("C",6))
groupf <- factor(groups, levels = c("A","B","C"))
Two types of design
object
design <- model.matrix(~0+groupf) # without intercept
design2 <- model.matrix(~groupf) # with intercept
colnames(design) <- c("A","B","C")
Contrast matrix
constrasts <- makeContrasts(B-A, C-A, C-B,levels = design)
Different approaches for model fitting
1. Fit with intercept and no contrasts
fitintr <- lmFit(tomat, design = design2)
fitintr <- eBayes(fitintr)
2. Fit with no intercept and no contrasts
fit <- lmFit(tomat, design = design)
fitnocontr <- eBayes(fit)
3. Fit with no intercept and contrasts
fitcontrasts <- contrasts.fit(fit,constrasts)
fitcontrasts2 <- eBayes(fitcontrasts)
I notice that using topTableF
on the outputs of the approaches 1 and 2 gives the same F and P-values. In my case, this approach gives me a result that I would interpret as '~90% proteins are affected by the grouping variable', since most proteins show a p-value lower than 0.01.
Does this means that establishing no contrasts is equivalent to fitting a model with the intercept? Is this a problem? Why?
The topTableF
result from the approach 3 gives very different results. With a much smaller number (25) of proteins appearing affected by the grouping.
I would like to better understand these differences and why using one approach would be better than the other one.
Many thanks in advance for taking the time to read and for any comment you might offer.
Many thanks for the thorough explanation. Really appreciated!