Question

Can DESeq2's design compensate for sequencing experimental design shortcomings?

0

Entering edit mode

txema.heredia • 0

@01ec307d

Last seen 4 days ago

Netherlands

Hi,

I am reanalyzing some old bulk RNA-seq data from cell lines with the following groups:

3 treatments
- treatment A
- treatment B
- combination of treatments A+B
At 2 timepoints
- Day 1
- Day 3

We also have control samples, but, unfortunately, only for Day 1, not D3.

I've run DESeq2 with a simple design=group. I am noticing that many (not all) of the highly significant DEG in treatmentX_day3 vs Control_day1 are similarly changed in all 3 treatments at day3 vs all day1 groups (when looking directly at the normalized counts). I expect that many of those to be a consequence of the handling and incubation, and would also be present in a control_day3 sample that we, unfortunately, don't have.

Also, treatment B has a stronger response than treatment A (expected from the literature, different pathways of action) with comparisons vs Control_day1:

treatA_day3 vs control_day1: 835 up / 416 down
treatB_day3 vs control_day1: 1,704 up / 1,633 down
combiA+B_day3 vs control_day1: 1,486 up / 1,570 down

I was wondering if, by using a smarter design, DESeq2 could help compensate for this issue.

If I am not mistaken, by some experience in a different experiment I've had on the past, if I were to remove the control samples from the dataset and I were to re-run DESeq2 again with design=time*treatment, that would simply pick treatA (the 1st level of treatment), and calculate the DEG through time there. It would also give the DEG of treatB_day1 vs treatA_day1 (and for combiAB_day1), and an "interaction component" of "treatB_day3 vs treatA_day1" detecting the genes with different temporal dynamics between the 2 treatments.

Would that approach be enough to compensate for the genes changing just by the pass of time? Would it work better by picking treatmentB as the baseline? (because its response dominates the combination treatment)

Would using just design=time+treatment (just +, no interaction *) be correct, or would the effect of time be dominated by the DEG shared by treatB+combiAB (1244 up + 1217 down in both), affecting the assessment of treatmentA? Or would I be better doing a simpler handmade approach where I use the basic pairwise comparisons, find DEG common in treatA_day3 vs control_day1, and treatB_day3 vs control_day1 (437 up + 241 down in both), and remove those from the results for downstream analyses?

Thanks in advance

DESeq2 • 122 views

ADD COMMENT • link updated 6 hours ago by Michael Love 43k • written 9 days ago by txema.heredia • 0

score 0 · Answer 1 · 2024-12-20

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

For statistical analysis plans, I recommend working with a local statistician or someone familiar with linear models in R. I have to reserve my time on the support site for software related issues.

ADD COMMENT • link 6 hours ago Michael Love 43k