I have been using the DESeq2 package for differential expression and have really liked the flexibility that the package provides for complex models including interaction effects. In one of the recent updates some of the interaction term functionality that I had been using has been disabled. Specifically, the ability to test one interaction term versus all others within the expanded model matrix framework. I was wondering if there is a workaround to accomplish the comparison of one interaction term to all of the other interaction terms after this update.
My current workaround is to run the last version of DESeq2 that allowed testing of interaction effects within the expanded model matrix, which I think is 1.9.34, but I would obviously prefer to stay updated with the newest version of the software.
Using the current vignette example with conditions {A, B} and the genotypes {I, II, III}, I would like to fit the full interaction model (~ condition + genotype + condition:genotype) and test the condition specific effect of one genotype as compared to the rest of the genotypes. I believe that the correct way to do this is with this contrast: contrast=list("genotypeI.conditionB", c("genotypeII.conditionB", "genotypeIII.conditionB")).
I think that originally this was the correct usage of the interaction term comparison within the expanded model matrix. I understand that this has caused confusion for users, but I was wondering if it is possible to complete such a comparison under the current implementation, or if reverting to a previous version is the best way to go for this type of analysis. Thank for any help!
Hi Michael,
I just tested out the model you suggested and the lack of a beta prior on the interaction terms appears to be having a huge effect on the results. The top of the p-value rank lists agree quite well, but the fold change values are obviously very different due to the lack of a beta prior. I have attached a figure showing the log2 fold change obtained using DESeq2 v1.9 with the expanded model matrix (oldLogFC) versus those obtained using DESeq2 v1.10 and a standard model matrix (newLogFC). I have scaled the color according to the log base mean of each gene. You can see that the lower expression genes show inflated FC values in the newer run and drastically change the gene rank list.
Are there any plans to allow interaction terms within the expanded model matrix in DESeq2 at a later time or do you think that using the most recent version which allows these models would be the best course of action? Or is there some other solution that would suffice in this situation? I know that using a model such as "~ genotype + group" would not be full rank, but there may be another solution I am not thinking of.
Thank you again for supporting such an amazing package and I really appreciate your quick response! Look forward to your thoughts.
Thanks! I think I will probably move forward with this analysis using the v1.9 extended model matrix interaction setup as I really like that I can use a single statistic for reliable rank lists.
To be clear on the "testing against a higher value of LFC" point; this will not resolve the low expression genes producing lower p-values. That is solely resolved by the application of a beta prior. Thus for larger effects (and testing against a higher LFC value) I should still either use a p-value ranking or FC ranking within an FDR controlled set. Is that correct?
The p-values are not very much changed by the beta prior. The beta prior mostly just has an impact on the LFC we report.
By "testing against a higher value of LFC", I was referring to testing against a threshold greater than zero, in case you wanted to get a set of genes with large fold changes.
We have a section of the vignette on this and a longer discussion in the DESeq2 paper.
Or you can test against an LFC of 0 and sort that list by fold change. The difference is discussed in the paper.