I am experiencing problems with adding covariates to my paired samples design in DESeq2. We have 39 paired samples of human subjects (39 untreated samples; 39 treated samples). So in summary, it is a ‘time series’ experiment with paired samples. We are interested in the effects of treatment on gene expression.
Based on the information available in the DESeq2 vignette, I created the following design:
~ patientID + treatment
(where patientID is a factor with 39 levels indicating the paires and treatment is a factor with 2 levels: baseline/post treatment).
This design works perfectly. However, when including additional covariates to this formula I am experiencing problems.
We also want to adjust our data for covariates which are known to influence gene expression in humans, i.e. age (numeric), smoking status (factor with 2 levels: current smoking/ non smoking) and gender (factor with 2 levels). The design I have created is as follows:
~ patientID + age + currentsmoking + gender + treatment
However, when including these covariates I get the following error:
Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
I realized that this error occurs because the variable patientID is 100% correlated with the additional covariates, and therefore it is not possible to run this analysis. Is there a way to include covariates in a paired sample design in DESeq2? I hope you can help me out.
Thank you for your quick reply. However, it could be the case that for example, smokers or male subjects respond differently to treatment than non-smokers or females. To adjust for this effect, we would like to add smoking status and gender as covariates. Is there a possibility to include covariates in a DESeq2 design of paired samples?
This is possible, and explained very well in Michael's vignette on DESeq2, under the section "Group-specific condition effects, individuals nested within groups". If you change the word "condition" to "Treatment", and "Group" to "smoking status", and follow what's written, you should be fine.
Earlier when you asked about how to adjust for covariates which are known to influence gene expression, I assumed that you meant control for these covariates while estimating a general treatment effect. The approach for adjustment is to add covariates to the design, and then put the covariate of interest at the end, as you had above.
You hadn't mention previously wanting to fit different treatment effects for various levels of covariate. This is possible, but it requires you to specify exactly what you want to test. You would need to pre-specify all the covariate-specific treatment effects before you run the model, because adjusting this afterward would lead to loss of error control.
Do you want to test for all combinations of smoking and sex, or smoking-specific and sex-specific separately? Is this the exhaustive list of treatment effects you want to test?
There is a specific section of the vignette which the error above should prompt you to read (are you using the latest version of DESeq2?), and it includes how to fit group-specific condition effects, when the individuals are nested within groups:
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups