Hi everyone,
For DESeq2, I am trying to understand the interpretation of with/without an interaction term in the design formula. Assume we have two conditions (Treatment vs Control) and two genotypes (Mutant vs WT). Here is a simulation study and my interpretation.
> library("DESeq2")
# make example dataset and fix the reference level
> dds <- makeExampleDESeqDataSet(n=10000,m=12)
> dds$condition <- as.factor(c(rep("Control",6), rep("Treatment",6)))
> dds$genotype <- as.factor(rep(c(rep("WT", 3), rep("Mutant",3)),2))
> dds$genotype <- relevel(dds$genotype, "WT")
> as.data.frame(colData(dds))
> head(assay(dds))
Case1: With interaction
I could include the interaction term in the design, as I showed in the following code. Then I would retrieve the result of conditionTreatmentvs_Control. From what I understand, this is comparing Treatment and Control under WT, or comparing sample 7-9 with sample 1-3.
> design(dds) <- ~ condition + genotype + condition:genotype
> dds <- DESeq(dds)
> results(dds, name = "condition_Treatment_vs_Control")[1:5,]
Case2: Without Interaction
I could write the design formula without the interaction term and still, I want to retrieve the result of conditionTreatmentvs_Control. This is technically the same as model the batch effect. DESeq2 will regress out the effect of genotype. In other words, DESeq2 will take all samples into consideration.
> design(dds) <- ~ condition + genotype
> dds <- DESeq(dds)
> results(dds, name = "condition_Treatment_vs_Control")[1:5,]
But if you look at the results, they are very different. I want to validate my point by manually calculate the log2FoldChange column.
Here I try to calculate the log2foldchange for gene 1, with the interaction term included in the design formula
> log2(mean(counts(dds, normalized = T)[1,7:9]) / mean(counts(dds, normalized = T)[1,1:3]))
This is almost the same as the result of DESeq2 (with interaction).
So here is my question: 1. Is my interpretation of the design formula correct? 2. How to manually calculate the log2foldchange for case2 (without interaction)?
Thanks all.
Thanks Michael. This helped with my understanding to DESeq2. I have two additional questions regarding shrinkage estimators. It will be great if you can give me some suggestions.
"apeglm"
and"ashr"
are both mentioned in the vignette. Do you have suggestions on which method to choose in which specific case? Or both are general methods that could apply to every situation.You mentioned that shrinkage estimators are not recommended for design with the interaction term. If I have condition (control vs treat) and genotype (mutant and WT) in my design matrix and I specify design formula as
~ condition * genotype
. Do you mean estimator shouldn’t be use in allresultsNames(dds)
include"Intercept", "condition_Treat_vs_Control", "genotype_Mutant_vs_WT", "conditionTreat.genotypeMutant"
? Or do you mean estimator shouldn't be use in"conditionTreat.genotypeMutant"
?Thanks very much for replying.
(1) Both are general methods, but there are some cases where only ashr can be used. In the extended section there is a table summarizing this.
(2) Shrinkage estimators can now be used with interaction terms with apeglm and ashr (that's actually a row in the table I'm referring to). We had found that we could not reliably use them with the 2014 estimator, but we resolved that issue with the new methods.
Hello, Michael. If I have confirmed the existence of interaction term effect by results(dds, name = "conditionTreat.genotypeMutant''), whether can I set design(dds) <- ~ genotype+condition to reduce the effect of genotype? Or it can just be used when interaction term effect doesn't exist.
Yeah technically the reduced model is mis-specified if the interaction exists so you should then always include it.