You can put in age
as a continuous covariate (I'll use an intercept-only model here, as it's easier to explain):
design <- model.matrix(~0 + gender + gender:age)
Running colnames(design)
gives us:
[1] "genderF" "genderM" "genderF:age" "genderM:age"
To understand what's going on here, imagine fitting a line to the expression values of all female samples against the age. The first and third coefficients represent the intercept and gradient, respectively, of this fitted line. The same reasoning applies to the second and fourth coefficients for the male samples. If you want to test for an age effect in either sex, drop the coefficient corresponding to the gradient term for that sex. You can also do more complex comparisons, e.g., compare the two gradient terms to each other to identify genes that exhibit sex-specific age effects.
Now, the linear approach I've described above is somewhat inflexible, as it doesn't allow for non-linear trends in expression with respect to age. If you have enough samples, you can improve upon the model by using splines:
require(splines)
X <- ns(age, df=5) # Any df between 3 - 5 usually works well.
design <- model.matrix(~0 + gender + gender:X)
This will fit a spline with 5 d.f. to the expression values against the age for each sex, which allows the model to handle non-linear trends. You'll end up with 5 spline coefficients for each sex, in comparison to the single gradient term you had before for the linear approach. To test for an age effect, you should drop all of the spline coefficients for each sex. However, be warned that the log-fold changes you get from this approach won't have a great amount of meaning, as the values of the spline coefficients don't have an obvious interpretation.
Hi Aaron,
I am also dealing with continuous covariate in my design matrix as well. I am using edgeR, I have created a following design matrix:
design1 <- model.matrix(~groups+groups:age)
where there are four different groups.
So, colname(design1) would give:
(Intercept) group2 group3 group4 group1:age group2:age group3:age group4:age.
If I were to do a comparison such as: lrt <- glmLRT(fit, coef=2), what am I comparing? Am I merely comparing group1 and group2? It sounds like (from what you wrote above), this comparison is affected by the age covariate included in the design matrix. If I were to create a design matrix: design2 <- model.matrix(~group), and perform the same comparison between group1 and group2, would that give a same result?
Also, using design1 again, if I were to compare group2:age with intercept (coef=6), would that be testing for age effect on group2? And comparing group1:age and group2:age would be testing for the differences in age effect between group1 and group2?
My apologies for asking a lot of questions. I am quite confused by the usage of design matrix, especially with continuous variables interaction. Thank you.
Post this as a new question with the appropriate tags.