I have rnaseq data for various combinations of three different factors (let's call them X, Y, and Z). Some of these combinations represent controls, and the rest represent treatments.
I want to perform differential gene expression analysis for various pairs of treatment and control combinations.
I can't figure out how to specify the design
parameter for this sort of analysis.
If I specify the design
parameter as
design ~ X + Y + Z
...then resultNames(dds)
includes names like "X0", "X1", ..., "Y0", "Y1", etc. whereas for the contrasts I'm interested in, I'd need resultNames(dds)
to return names corresponding to the different combinations I want to compare, i.e. something like "X0.Y0.Z0", ..., "X3.Y4.Z5", etc.
One "solution" is to stratify the data into subsets corresponding to combinations of the levels of two of the factors (say, X and Y), and then perform the DE analysis on each subset using design = ~ Z
.
Is there a simpler way to achieve this?
Thanks for your comment. I have not been able, however, to locate in the DESeq2 vignette the suggestion you mentioned. I looked in subsection 1.6, titled "Multi-factor designs". There the design used is
~ type + condition
, but, as I mentioned in my question, designs that combine various factors using+
result in names that are not suitable for the contrasts I want to analyze. Did you have some other part of the vignette in mind?It's section 3.3 of the vignette which says "If the comparisons of interest are, for example, the effect of a condition for different sets of samples, a simpler approach than adding interaction terms explicitly to the design formula is to perform the following steps: 1. combine the factors of interest into a single factor with all combinations of the original factors 2. change the design to include just this factor, e.g. ∼ group"
That's definitely much easier and elegant than what I had in mind. Thanks!