Question

[limma] Define contrasts for model with covariates

3

Entering edit mode

enricoferrero ▴ 680

@enricoferrero-6037

Last seen 3.4 years ago

Switzerland

I'm having a hard time understanding how to pull out the relevant contrasts when using a model with covariates in limma.

Let's say I have the following gene expression matrix and experimental variables:

library(limma)
set.seed(16)

mat <- matrix(rnorm(600), 100, 6)

age <- c(45, 55, 47, 56, 43, 56)
gender <- factor(c("M", "F", "F", "F", "M", "F"))
condition <- factor(c(rep("healthy", 3), rep("disease", 3)), levels = c("healthy", "disease"))

If I were to use a very simple design (~ condition) then I could easily identify genes differentially expressed in disease vs healthy both using a treatment-contrast parametrization (case A below) or using a group-means parametrization and manually defining the contrast (case B below):

# case A: simple design; treatment-contrast parametrization
design_A <- model.matrix(~ condition)
fit_A <- lmFit(mat, design_A)
fit_A <- eBayes(fit_A)
tt_A <- topTable(fit_A, coef = "conditiondisease")

# case B: simple design; group-means parametrization
design_B <- model.matrix(~ 0 + condition)
fit_B <- lmFit(mat, design_B)
cont.matrix_B <- makeContrasts(diseaseVShealthy = conditiondisease - conditionhealthy, levels = design_B)
fit_B <- contrasts.fit(fit_B, cont.matrix_B)
fit_B <- eBayes(fit_B)
tt_B <- topTable(fit_B, coef = "diseaseVShealthy")

In fact, in this case the tt_A and tt_B tables contain the same results.

However, if I want to use a more complex design (~ age + gender + condition), I can still do the treatment-contrast parametrization (case C below) but I can no longer do the group-means parametrization because I'm missing the conditionhealthy column in the design matrix (case D below):

# case C: complex design; treatment-contrast parametrization
design_C <- model.matrix(~ age + gender + condition)
fit_C <- lmFit(mat, design_C)
fit_C <- eBayes(fit_C)
tt_C <- topTable(fit_C, coef = "conditiondisease")

# case D: simple design; group-means parametrization
design_D <- model.matrix(~ 0 + age + gender + condition)
fit_D <- lmFit(mat, design_D)
cont.matrix_D <- makeContrasts(diseaseVShealthy = conditiondisease - conditionhealthy, levels = design_D)

Error in eval(ej, envir = levelsenv) : 
  object 'conditionhealthy' not found

#fit_D <- contrasts.fit(fit_D, cont.matrix_D)
#fit_D <- eBayes(fit_D)
#tt_D <- topTable(fit_D, coef = "diseaseVShealthy")

I would like to use the group-means parametrization with the more complex design because I think it's much clearer what's going on.

How can I do that?

Thank you!

P.S.: Note that I also don't completely understand the expressions "treatment-contrasts" and "group-means" parametrization, but this is what they are called on the limma user guide (section 9.2).

limma model.matrix makecontrasts • 6.0k views

ADD COMMENT • link updated 6.8 years ago by Gordon Smyth 52k • written 6.8 years ago by enricoferrero ▴ 680

score 3 · Accepted Answer · 2018-06-24

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Just be sure to put condition first in the linear model:

sign_D <- model.matrix(~ 0 + condition + age + gender)

Then you can use contrasts just as you did in Case B.

ADD COMMENT • link 6.8 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks Gordon!

Can you please explain why that's the case? I.e.: why only the first categorical variable in the formula gets both levels as coefficients?

ADD REPLY • link 6.8 years ago enricoferrero ▴ 680

3

Entering edit mode

Because every following term is interpreted as an additive effect. In Gordon's model above:

There is no intercept because of the ~0.
condition has healthy and disease levels. There's no intercept, so the function creates a term for healthy, and another term for disease, representing the average log-expression in each group.
age is real valued, so this just gets added as an extra column.
gender (which is a social construct, I suppose you really mean sex) has male and female. With the default level ordering, i.e., alphanumeric, Female is treated as the reference. This means that the sum of the existing terms, with no other additions, describe the expected log-expression in female samples. For males, the function adds another term for the male vs. female log-fold change, which is presumed to be additive.

Hence, you get terms for all levels in condition, but by the time you get to gender, you don't get a term for the first reference level. Sometimes you can trick model.matrix into giving you a term, but this usually means that your matrix is not of full rank and that you need to remove the term in order to actually fit the linear model.

ADD REPLY • link 6.8 years ago Aaron Lun ★ 28k