Hi,
I am reanalyzing some old bulk RNA-seq data from cell lines with the following groups:
- 3 treatments
- treatment A
- treatment B
- combination of treatments A+B
- At 2 timepoints
- Day 1
- Day 3
We also have control samples, but, unfortunately, only for Day 1, not D3.
I've run DESeq2 with a simple design=group
. I am noticing that many (not all) of the highly significant DEG in treatmentX_day3 vs Control_day1 are similarly changed in all 3 treatments at day3 vs all day1 groups (when looking directly at the normalized counts). I expect that many of those to be a consequence of the handling and incubation, and would also be present in a control_day3 sample that we, unfortunately, don't have.
Also, treatment B has a stronger response than treatment A (expected from the literature, different pathways of action) with comparisons vs Control_day1:
- treatA_day3 vs control_day1: 835 up / 416 down
- treatB_day3 vs control_day1: 1,704 up / 1,633 down
- combiA+B_day3 vs control_day1: 1,486 up / 1,570 down
I was wondering if, by using a smarter design, DESeq2 could help compensate for this issue.
If I am not mistaken, by some experience in a different experiment I've had on the past, if I were to remove the control samples from the dataset and I were to re-run DESeq2 again with design=time*treatment
, that would simply pick treatA (the 1st level of treatment), and calculate the DEG through time there. It would also give the DEG of treatB_day1 vs treatA_day1 (and for combiAB_day1), and an "interaction component" of "treatB_day3 vs treatA_day1" detecting the genes with different temporal dynamics between the 2 treatments.
Would that approach be enough to compensate for the genes changing just by the pass of time? Would it work better by picking treatmentB as the baseline? (because its response dominates the combination treatment)
Would using just design=time+treatment
(just +
, no interaction *
) be correct, or would the effect of time be dominated by the DEG shared by treatB+combiAB (1244 up + 1217 down in both), affecting the assessment of treatmentA? Or would I be better doing a simpler handmade approach where I use the basic pairwise comparisons, find DEG common in treatA_day3 vs control_day1, and treatB_day3 vs control_day1 (437 up + 241 down in both), and remove those from the results for downstream analyses?
Thanks in advance