Question

Calculating contrasts for continuous variable anad interaction effect in Limma

2

Entering edit mode

jaison75 ▴ 20

@jaison75-9008

Last seen 7.2 years ago

United States

Hi,

I am a little confused about constructing the the appropriate contrast matrix in Limma when I have continuous variables  as one of the predictors and an interaction term in my model.  I am posting a minimal example below.

library(limma)
age = sample(c(20:75),80, replace=T)
gender <- sample(c("M", "F"), 80, replace =T)
gender <- as.factor(gender)
ageGrp <- ifelse(age>35, "old", "young")

# using age as factors I create the design matrix and calculate
# contrast for Old vs Young in Females and Males respectively
design <- model.matrix(~gender*ageGrp)
unique(design)
colnames(design) <- c("intercept", "gMale", "aYoung", "MxY")
cont.matrix <- makeContrasts(
  F_OvY = -aYoung,
  M_OvY = -aYoung - MxY, levels = design
)

How would I do the same analysis if age were continuous, ie if I use age instead of the ageGrp in the code above?

Thanks

-Jaison

limma design matrix design and contrast matrix continuous interactions • 6.0k views

ADD COMMENT • link updated 9.5 years ago by Aaron Lun ★ 28k • written 9.5 years ago by jaison75 ▴ 20

score 4 · Answer 1 · 2015-10-19

You can put in age as a continuous covariate (I'll use an intercept-only model here, as it's easier to explain):

design <- model.matrix(~0 + gender + gender:age)

Running colnames(design) gives us:

[1] "genderF"     "genderM"     "genderF:age" "genderM:age"

To understand what's going on here, imagine fitting a line to the expression values of all female samples against the age. The first and third coefficients represent the intercept and gradient, respectively, of this fitted line. The same reasoning applies to the second and fourth coefficients for the male samples. If you want to test for an age effect in either sex, drop the coefficient corresponding to the gradient term for that sex. You can also do more complex comparisons, e.g., compare the two gradient terms to each other to identify genes that exhibit sex-specific age effects.

Now, the linear approach I've described above is somewhat inflexible, as it doesn't allow for non-linear trends in expression with respect to age. If you have enough samples, you can improve upon the model by using splines:

require(splines)
X <- ns(age, df=5) # Any df between 3 - 5 usually works well.
design <- model.matrix(~0 + gender + gender:X)

This will fit a spline with 5 d.f. to the expression values against the age for each sex, which allows the model to handle non-linear trends. You'll end up with 5 spline coefficients for each sex, in comparison to the single gradient term you had before for the linear approach. To test for an age effect, you should drop all of the spline coefficients for each sex. However, be warned that the log-fold changes you get from this approach won't have a great amount of meaning, as the values of the spline coefficients don't have an obvious interpretation.