Hi all,
Despite reading the DESeq2 vignette (and other useful references), I'm still a little uncertain about how to best test for differential expression in the presence of a group by condition interaction. I have two conditions (for simplicity here cond1 and cond2), and eight species (for simplicity here species1, species2, ..., species8). What I hope to do is look for genes that are differentially expressed genes between the two conditions in all species, avoiding genes where there is a species*condition interaction.
To explain the confusion, I'll start by talking about what I'd do when running GLMs with interactions using the glm.nb
function (from MASS). In this situation, for each gene I would (i) run glm.nb
; (ii) run anova
on the results. The design for glm.nb with an interaction is Count ~ Species + Condition + Species:Condition
.
Therefore, for each gene I would have results like this:
Any gene for which the Species:Condition interaction is significant (like in the screenshot above) cannot be included in my results, so I remove them from the results. I'm left with a results table for genes where there is a significant difference between cond1 and cond2, and where there is no species*condition interaction.
In DESeq2, I believe that if I had just two species, I could run an analysis including an interaction term like so:
dds <- DESeqDataSetFromMatrix(countData = counts, colData = samples, design = ~Species+Condition+Species:Condition)
dds <- DESeq(dds)
results_interaction <- results(dds, name="Speciesspecies2.Conditioncond2")
Note: reference level of "Species" is species1; reference level of "Condition" is cond1.
Question 1: Am I interpreting the output correctly that any gene with a significant pvalue in the results_interaction
object I create above are those genes for which there is a significant interaction between Species and Condition? If so, are these genes analagous to those that have a significant interaction using functions like glm.nb
, as in the screenshot example above?
Question 2: To get the genes differentially expressed between cond1 and cond2 in species1 (but without a Species:Condition interaction), do I do the following?
res1 <- results(dds, name="Condition_cond2_vs_cond1")
Question 3: Is there a way to directly test for genes that are differentially expressed in all species (i.e., don't have a Species:Condition interaction)? If not, do I have to do something like:
res1 <- results(dds, name="Condition_cond2_vs_cond1")
#testing for DE without Species*Group interaction
res1 <- as.data.frame(res1)
res1 %>% filter(padj < 0.05) %>% arrange(padj) -> res1_DE
res1_DE <- rownames(res1_DE)
res2 <- results(dds, list(c("Condition_cond2_vs_cond1", "Speciesspecies2.Conditioncond2")))
#testing for DE without Species*Group interaction
res2 <- as.data.frame(res2)
res2 %>% filter(padj < 0.05) %>% arrange(padj) -> res2_DE
res2_DE <- rownames(res2_DE)
both_DE <- union(res1_DE, res2_DE)
Question 4: How do the answers to these questions scale up to situations where I am looking for a difference between two conditions in >2 species? For example, for one component of my actual study I need to look for genes that are differentially expressed between cond1 and cond2 in all eight species (or in a certain 4/8 species etc.).
In my actual study, the possible resultsNames look like so:
resultsNames(dds)
[1] "Intercept"
[2] "Speciesspecies2vsspecies1"
[3] "Speciesspecies3vsspecies1"
[4] "Speciesspecies4vsspecies1"
[5] "Speciesspecies5vsspecies1"
[6] "Speciesspecies6vsspecies1"
[7] "Speciesspecies7vsspecies1"
[8] "Speciesspecies8vsspecies1"
[9] "Conditioncond2vscond1"
[10] "Speciesspecies2.Conditioncond2" [11] "Speciesspecies3.Conditioncond2" [12] "Speciesspecies4.Conditioncond2" [13] "Speciesspecies5.Conditioncond2" [14] "Speciesspecies6.Conditioncond2" [15] "Speciesspecies7.Conditioncond2" [16] "Speciesspecies8.Conditioncond2"
Thanks to anyone who can help with these questions!