Question

Time series and interaction in DESeq2

0

Entering edit mode

Francesco ▴ 10

@410b3403

Last seen 2.8 years ago

Italy

Hi everyone, I have theoretical questions about the workflow that would better fit my RNAseq data. I am working on a dataset with a total of 15 samples. The experiment consists of a single strain of a fungus exposed (treatment) or not (control) to a fungicide. Two sets of "control" and "treatment" samples were picked, at 1 day and 4 days, respectively. Below is the coldata object.

       condition day
T1A13    Control   1
T1B13    Control   1
T1C13    Control   1
T1D13    Control   1
T1VA13 Treatment   1
T1VB13 Treatment   1
T1VC13 Treatment   1
T1VD13 Treatment   1
T4A13    Control   4
T4B13    Control   4
T4C13    Control   4
T4VA13 Treatment   4
T4VB13 Treatment   4
T4VC13 Treatment   4
T4VD13 Treatment   4

I went through the DESeq2 vignette, as well as this guide, but I'm almost sure I'm getting something wrong. I would like to specify that the samples picked at day 4 are not the same as those sampled at day 1: in other words, all the samples in the dataset are different individuals. My main biological question is: are some genes impacted by the treatment initially, but then up-regulated as a consequence of prolonged exposure? The part of the vignette explaining the "overall" (I have difficulties in getting the meaning of "overall" here) effect of condition over genotype goes with:

~genotype + condition

while the "main" effect of condition on the reference genotype is summarized by:

~genotype + condition + genotype:condition

Therefore, in my case I would use

~day + condition + day:condition

to answer my question, right? My doubt is: with this formula, am I assuming a priori that the effect of "condition" depends on "day"?

After this, should I just run:

results(dds, contrast=c("condition", "treatment", "control"))

since I am already considering "day" to have an influence on the data? Or should I still contrast, separately, the different groups:

 Levels: Control1 Control4 Treatment1 Treatment4

?

I saw that most guides on time series report >=3 time points, so another question is: can I use a "time series" workflow with just two time points?

Thank you all in advance!

DESeq2 • 2.4k views

ADD COMMENT • link 2.8 years ago Francesco ▴ 10

score 1 · Answer 1 · 2022-04-02

1

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 16 hours ago

San Diego

You need to reread the vignette very carefully, because the results command you specified above with the design you specified above is almost certainly not returning what you think it is.

If your goal is to see if the fold change caused by treatment is different on Day4 versus Day1, you've got the right design, but you need to use resultsNames to get the right nomenclature for what you want (it will be the last one, it will look something like conditionTreatment.Day1) and use that as a name when calling results (not that contrast =c() thing)

ADD COMMENT • link 2.8 years ago swbarnes2 ★ 1.4k

1

Entering edit mode

Many thanks for the reply.

I still have a doubt: in the ?results examples, I see that results(dds, contrast=c("condition","B","A")) gives the effect of condition for genotype I, which would be day 1 in my case. You suggest something similar to results(dds, name="genotypeII.conditionB"), which would be results(dds, name="day4.conditionTreatment") in my case. This gives me 678 DEGs. Instead if, with the same design, I use:

dds <- DESeq(dds, test="LRT", reduced = ~ day + condition)
res<- results(dds)

I get 667 DEGs. Should I call DESeq with the LRT test, and then get the padj values with results(dds, name="day4.conditionTreatment")? Or should I use "Wald", and then call the results in the same way? I understand that the 678 vs 667 is not that different, but I would like to understand which is the best way to proceed. Many thanks again for your help.

ADD REPLY • link 2.8 years ago Francesco ▴ 10