Hi all,
We have gene expression data for 12 cell types both in human and mouse, and we would like to use DESeq2 to find genes that are differentially expressed between human and mouse in the same cell type. As a simplified example, suppose for a particular gene we have these expression levels:
celltype | human | mouse |
---|---|---|
A | 100 | 200 |
B | 80 | 160 |
C | 120 | 240 |
D | 100 | 200 |
E | 60 | 120 |
F | 40 | 400 |
G | 0 | 0 |
H | 200 | 400 |
I | 120 | 240 |
J | 150 | 300 |
K | 180 | 360 |
L | 20 | 40 |
i.e. the expression level is twice as high in mouse compared to human, but it is in cell type F the expression in mouse is ten times higher compared to human. We are primarily interested in the interaction effect, so we want to identify that this gene is differentially expressed in cell type F, but we don't care about the fact that in general the expression of this gene in mouse is twice the expression in human (which we consider a batch effect).
When calling DESeqDataSetFromMatrix, should the design then include the main effects only (i.e. "~ organism + celltype") or should we also include the interaction effect (i.e. "~ organism + celltype + organism:celltype")? And when calling the results function, how should we specify the interaction effect (I guess it should be something like c("organism:celltype", "human:celltypeX", "mouse:celltypeX"), where celltypeX loops from A to L)?
Thank you,
Michiel