Dear DESeq2 community,
I am currently working with RNA-seq data from pre- and post-treatment samples from 3 patients across 4 different cell types (A B C D). The experimental design is shown here:
> my_metadata
patient celltype_treatment
1 P1 A_pre
2 P2 A_pre
3 P3 A_pre
4 P1 A_post
5 P2 A_post
6 P3 A_post
7 P1 B_pre
8 P2 B_pre
9 P3 B_pre
10 P1 B_post
11 P2 B_post
12 P3 B_post
13 P1 C_pre
14 P2 C_pre
15 P3 C_pre
16 P1 C_post
17 P2 C_post
18 P3 C_post
19 P1 D_pre
20 P2 D_pre
21 P3 D_pre
22 P1 D_post
23 P2 D_post
24 P3 D_post
I am currently only interested in identifying differentially expressed genes for within-celltype comparisons (the effect of treatment on the gene expression within each celltype). That is, I am only interested in the A_pre vs. A_post, B_pre vs. B_post, C_pre vs. C_post, and D_pre vs. D_post comparisons (4 comparisons in total).
That being the case, is the best practice to build 1 DESeq model using all of these samples, or to use each of 4 cell types to build 4 separate DESeq2 models? I am aware of vignette section titled "If I have multiple groups, should I run all together or split into pairs of groups?", which specifies to run samples from all groups together and then specify contrasts, but my understanding was that this approach should be taken in the case that I would want to perform further comparisons, such as A_pre vs B_pre, which I am not interested in.
I have tried both approaches. Using the 1-model approach for cell type A, I obtain 35 differentially expressed genes pre/post-treatment (padj <= 0.05). With the 4-model approach, I obtain 39 differentially expressed genes for cell type A pre/post-treatment, of which only 8 overlap with the results of the first approach.
Thank you in advance!
Got it, thank you for the prompt response!