I just wanted to check my understanding of the linear model used to build the design matrix for DESeq2 and was wondering if someone could tell me if all of this is correct. I have made a schematic of my understanding and what I think is going on below.
design ~ cell_type + treatment + cell_type:treatment
Line 1: intercept of model (baseline expression)
Line 2: effect of treatment in cell type A
Line 3: intrinsic difference in expression in cell type B vs. A
Line 4: effect of treatment in cell type B
Line 5: Interaction term (Line 4 - Line 2) is the difference in effectiveness of treatment in cell type B vs cell type A
So the linear model will be expression = beta1 + beta2*cell_type + beta3*treatment + beta4*interaction where cell_type, treatment, and interaction can take either 0 or 1.
If we first focus on cell type A, cell_type=0, interaction=0. For A control: expression = beta1 + beta2*0 + beta3*0 + beta4*0 = beta1 so in this case line 1. For A treatment: expression = beta1 + beta2*0 + beta3*1 + beta4*0 = beta1 + beta3 so in this case line 1 + line 2.
Looking at cell type B, cell_type=1. For B control: expression = beta1 + beta2*1 + beta3*0 + beta4*0 = beta1 + beta2 so in this case line 1 + line 3. For B treatment: expression = beta1 + beta2*1 + beta3*1 + beta4*1 = beta1 + beta2 + beta3 + beta4 so in this case line 1 + line 3 + line 4 (which is composed of line 2 + the extra interaction effect).
In what instances would you not include an interaction term and just have design ~ cell_type + treatment? When you think the treatment has the same effect in each cell type or when you don't care about cell type specific effects and are just interested in does treatment have any effect vs control?
Thanks
hi Jake,
James has good answers below. Just a note that I'd recommend using DESeq(dds, betaPrior=FALSE) for designs with interactions, so you get the standard terms as you would in a linear model with interactions. I'm in the process of having DESeq() by default turn off the log fold change prior for designs with interactions, to make things easier.
Hi,
Can you expand on this briefly? I believe this is the relevant section from the help file: "When a log2 fold change prior is used (betaPrior=TRUE), then
nbinomWaldTest
will by default use expanded model matrices, as described in themodelMatrixType
argument, unless this argument is used to override the default behavior or unless the design contains 2 level factors and an interaction term. This ensures that log2 fold changes will be independent of the choice of reference level. In this case, the beta prior variance for each factor is calculated as the average of the mean squared maximum likelihood estimates for each level and every possible contrast. Theresults
function without any arguments will automatically perform a contrast of the last level of the last variable in the design formula over the first level."So what exactly does setting betaPrior to false do and why is that a good idea for interaction terms?
Sure, betaPrior=FALSE means that we do not put a prior on log2 fold changes in the model. For more background on the log2 fold change prior, you quickly scan the relevant section of our paper. But the important thing to know is that the inference (the results tables) are accurate with betaPrior=TRUE or betaPrior=FALSE. It's just an additional option which makes the log2 fold change estimates more directly interpretable. With betaPrior=FALSE, the log2 fold change for a simple comparison is simply the average of normalized counts in group B / average of normalized counts in group A.
When we do use a log2 fold change prior (so if betaPrior=TRUE), we need to make sure that the shrinkage effect is symmetric relative to the "reference level" of the factors. To accomplish this even when there are many factor levels, we implemented expanded model matrices, which include terms for all the levels as well as the intercept. We also discuss this in the Methods of the paper. But this means that the terms of an interaction design are slightly different than what you would get with
model.matrix
. I'm planning to just set betaPrior to FALSE by default if the design contains interaction terms, because users seem to have enough difficulty as it is interpreting the standard terms of an experimental design with interactions.