Hi all,
When I run limma compare disease vs control, the design object has two columns. However, when I adjust for covariates: sex (factor with 2 levels 0 and 1) and age, I have 4 columns: vector_age, vector_sex1, vector_sex2, vector_groupdisease
design <- model.matrix(~ 0 + group_vector)
design <- model.matrix(~ 0 + vector_age + vector_sex + group_vector)
So when I run:
contrast_matrix <- makeContrasts(diseasevsNC=group_vectordisease-group_vectorNC, levels=design)
Error in eval(ej, envir = levelsenv) : object 'group_vectorCN' not found
Is that limma automatically set NC as reference so I can run:
contrast_matrix <- makeContrasts(diseasevsCN = groupdisease, levels = design)
I appreciate your help.
Thanks James! May I know why the sex covariate has two columns? Meanwhile, the groups also has 2 levels as sex but only has one column in the design matrix?
There's a couple of things to unpack here. By default, if you specify no intercept (
~ 0 + ...
), then R will try to generate a cell means model, where you compute the mean of each group. But if the resulting design matrix won't be full rank, it will automatically drop coefficients. That's one point, but is primarily of interest for ANOVA models where all your coefficients are factors.Normally, if you have a continuous variable in your model (you are fitting an ANCOVA model), you want an intercept (otherwise you are forcing the regression line for Age to go through zero, which is probably not what you want, because there is no reason to think that any gene expression is zero at birth). And if you include an intercept, given that the default for R is a treatments contrast parameterization, that means you will be setting one group to be the baseline. If you are fitting an ANOVA, the difference between having an intercept or not is only relevant for coefficient interpretation. But if you are fitting an ANCOVA model, then you should include an intercept.
If this is confusing, you need to do some reading about linear regression in general, and how R does it in particular. Or if that isn't of particular interest, you should find someone local who understands linear regression to help you. Just doing stuff without understanding what you are doing is a recipe for disaster.
Anyway, as an example,
In the second parameterization, the intercept computes the mean of the female control subjects (which we know because all other coefficients in rows 9-10 are zero, except for age), and the SexM coefficient estimates the difference between control males and control females, and the DiseaseDisease estimates the difference between disease females and control females. But since you have already 'adjusted out' the difference between males and females, you use all the data to estimate disease vs control.
Thank you James for the detailed follow up! I great appreciate it. The first model.matrix is similar to mine with 4 columns. Try your suggestion now!