Question

DEseq2 time-series design.

2

Entering edit mode

shao ▴ 100

@shao-6241

Last seen 7.3 years ago

Germany

Hi,

I am working with time-series RNA-seq data, and would like to make sure I have the right design for my purpose.

The experiments like:

Time = rep(c("0h", "2h", "4h", "6h", "8h"), each = 2)
Treat = rep(c("Control", "Treat"), each = 10)
nameD <- paste(Treat, Time, c(rep(LETTERS[1:2], 5), rep(LETTERS[3:4], 5)), sep = "_")
coldata <- data.frame(row.names = nameD, Time = Time, Treat = Treat)

> coldata
             Time   Treat
Control_0h_A   0h Control
Control_0h_B   0h Control
Control_2h_A   2h Control
Control_2h_B   2h Control
Control_4h_A   4h Control
Control_4h_B   4h Control
Control_6h_A   6h Control
Control_6h_B   6h Control
Control_8h_A   8h Control
Control_8h_B   8h Control
Treat_0h_C     0h   Treat
Treat_0h_D     0h   Treat
Treat_2h_C     2h   Treat
Treat_2h_D     2h   Treat
Treat_4h_C     4h   Treat
Treat_4h_D     4h   Treat
Treat_6h_C     6h   Treat
Treat_6h_D     6h   Treat
Treat_8h_C     8h   Treat
Treat_8h_D     8h   Treat

This is longitudinal experiments, two biological replicates for treat and control respectively, and each sample is repeated sequenced in different time. Besides, treatments started before 0 hour, 0 hour only mean when the samples were sequenced.

I would like to find genes showing difference in any time point between Treat and control. I think the likelihood test is right choice.

The full model I used as

design(dds) <- formula(~ Time + Treat + Time:Treat)

the reduced model I used is different from Vignettes of DESeq2, I used

reduced_1 = formula(~ Time)

rather than

reduced_2 = formula(~ Time + Treat)

I think likelihood test with reduced_2 could not give small pvalue to genes showing consistent difference between treat and control, i.e., genes expression are parallel between two condition.

A few questions:

- Am I right to use reduced_1? (From the discussion and comments I think it fits my purpose).

- Because it is repeated measurements for each sample, the genes expression also correlate. Would it be a problem for the test?

- I have tried two test so far:

- Setting one (do not consider interaction): full = formula(~ Time + Treat), reduced = formula(~ Time)

- Setting two (consider interaction): full = formula(~ Time + Treat + Time:Treat), reduced = formula(~ Time)

- To my surprise, setting one actually find more DE genes than setting two basing on FDR <= 0.01. I noticed that fold change is small between treat and control, is it connected to the model complexity and lead to less DE genes? or better reasons?

Thanks in advance.

deseq2 timecourse design • 26k views

ADD COMMENT • link updated 5.8 years ago by simon baldwin • 0 • written 10.5 years ago by shao ▴ 100

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 4.7 years ago

Zentrum für Molekularbiologie, Universi…

Before getting to the actual question, could you please clarify: Are these 16 independent samples, such taht each sample underwent an experiment that was allowed to run until the given time? Or are these four samples, on which intermediate measurements were made (e.g., by taking aliquots, in case it is some liquid cell culture)?

ADD COMMENT • link 10.5 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Thanks for the good questions. There are four samples. I missed some important info and have edited my post.

ADD REPLY • link 10.5 years ago shao ▴ 100

0

Entering edit mode

lmolla ▴ 10

@lmolla-7662

Last seen 6.4 years ago

United States

Hi cshao ,

I am doing a similar analysis. Can you write the final code that you decided to use here.

I am specifically interested in the design.

From what I understand the better model to use is :

full = formula(~ Time + Treat + Time:Treat), reduced = formula(~ Time)

From this test the genes with small pvalue are those that show a treatment dependent change at any timepoint ?

ADD COMMENT • link 9.9 years ago lmolla ▴ 10

0

Entering edit mode

The proper design depends on your experimental design.

Your full and reduced designs above are probably not what you want, as this would include genes which are DE at time 0. Typically, we would suggest:

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time + treat)

Here is an example:

http://www.bioconductor.org/help/workflows/rnaseqGene/#time

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you very much for your reply and for being so helpful. My experiment is a bit more complicated but I am indeed interested in changes at time0 as well.

I have a time course experiment and I am also interested in differences at time0. My variables are :

condition=wt and knockdown of protein X (in triplicates)

time=day0, day1,day2

clone=A,B,C,D,E,F [wt=cloneA,B,C; knockdown = clone D,E,F) -> so this means that for clone A I have RNAseq data for time=day0,1,2

time0 is the moment when the cells are induced to differentiate. BUT the knockdown happened before day0. Thus I am interested in finding protein X dependent changes in any time point including time0 [I expect differences at time0] . I also expect a lot of gene expression changes due to differentiation so I want to be able to determine which gene expression changes are specifically dependent on protein X.

To my understanding the design I need is :

design(dds) <- ~ clone + time + treat + time:treat

dds <- DESeq(dds, test="LRT", reduced = ~  clone + time)

I am not sure if I understand the deseq2 design properly but any advice/suggestion would be greatly appreciated.

Thank you!

My sample table:

clone	condition	time
A	KD	d0
B	KD	d0
C	KD	d0
A	KD	d1
B	KD	d1
C	KD	d1
A	KD	d2
B	KD	d2
C	KD	d2
D	control	d0
E	control	d0
F	control	d0
D	control	d1
E	control	d1
F	control	d1
D	control	d2
E	control	d2
F	control	d2

Linda

ADD REPLY • link 9.9 years ago lmolla ▴ 10

0

Entering edit mode

hi Linda,

In that case, you can use

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

...to generate p-values which test for any difference due to treatment including time=0 (and which allows for differences in treatment at different times).

You can use another design if you want to test treatment effects at each time. See the suggestion here:

A: DESEq2 comparison with mulitple cell types under 2 conditions

With your particular experimental design, you can't control for the clone differences and test for effects due to treatment because they are confounded. In other words, there is no way to disentangle A+B+C effect from the KD effect, because these are exactly the same samples. So you have to leave clone out of the DESeq2 model for this experimental design. You could incorporate the clone information into limma using the duplicateCorrelation function (see the limma package for details).

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Thank you very much for your reply!

ADD REPLY • link 9.9 years ago lmolla ▴ 10

0

Entering edit mode

I am sorry to bother you again but when I run the design above I get :

> resultsNames(dds)
[1] "Intercept" "time_d1_vs_d0" "time_d2_vs_d0"
[4] "condition_KD_vs_control" "timed1.condition.KD" "timed2.condition.KD"

I want to be able to produce a table where I have all the genes that are differentially expressed due to the KD ( for all time points) and thus I am trying to understand what each of the resultNames(dds) means :

timed1.condition.KD = the affect of KD on timepointd1 ?
timed2.condition.KD = the affect of KD on timepointd2 ?
time_d1_vs_d0 = differentiatially expressed genes between d0 and d1 ?
time_d2_vs_d0 = differentially expressed genes between d0 and d2 ?
"Intercept" = ???? what does this mean
"condition_KD_vs_control" = differentially expressed genes between KD and control at any time point or at day0??

I am not exactly sure if there is a way to make a table for all KD dependent changes regardless of the time point. If my assumptions above are correct I would do the following...

results(dds, name="condition_KD_vs_control", test="Wald")

Linda

ADD REPLY • link 9.9 years ago lmolla ▴ 10

0

Entering edit mode

See my answer in another recent post:

A: DESeq2 LRT output help

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

I know this is an old post, but I have a question regarding the comment "to generate p-values which test for any difference due to treatment including time=0 "

If you use the formula

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

I understand that you are seeing if treatment causes a change in gene expression at any time point, but for the treated term in the formula, aren't you finding differences between all treated vs all untreated? Not specifically treated vs untreated at time 0?

It seems like this could cause genes to come back as significantly different because it would look like the sample size is tripled when its actually just the same sample on a different day.

For example, with my own data, I have 3 time points and 2 conditions. If I just compare treated vs untreated using Wald test at time 0, I get ~10 significantly different genes.

If I use the time series analysis to see which genes change over time due to the treatment,

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time + treat)

I get 0 genes as different.

When I try to see which genes are different at any time point due to treatment,

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

I now get almost 400 genes as significantly different.

Could this simply be because I've tripled my data set comparing treated to untreated, or is something else going on?

Thanks for your help on this!!

ADD REPLY • link 6.7 years ago CE ▴ 20

1

Entering edit mode

Can you post this as a new question with some example colData? That will help me give the best advice.

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

I use the same design as you posted.

ADD REPLY • link 9.9 years ago shao ▴ 100

score 1 · Accepted Answer · 2014-11-06

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 19 hours ago

United States

Hi, Using a reduced formula of ~ time + treat means that genes which show a consistent difference from time 0 onward will not have a small p value. This is what we think you want, because differences between groups at time 0 should be controlled for, e.g. random differences between the individuals chosen for each treatment group which are observable before the treatment takes effect.

ADD COMMENT • link 10.5 years ago Michael Love 43k

0

Entering edit mode

But cshao has no sample groups in his design table, hence this won't work that way, I think. Hence my question below.

ADD REPLY • link 10.5 years ago Simon Anders ★ 3.8k

0

Entering edit mode

True, it depends on the experiment. Even if not the same samples, another reason to control for time 0 would be: individuals in control group kept in container 1, individuals in treatment group kept in container 2. Context can be more or less important depending on the details of experiment.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Good point, I edited my posts.

ADD REPLY • link 10.5 years ago shao ▴ 100

0

Entering edit mode

I don't think one can control for the sample effect here, because the comparison of interest is across treatment group, which contain different samples. Maybe Simon sees a way though. Given that at time 0, you expect a treatment effect already, I would use your "setting 2" above, with the interaction term in the full model to allow for varying treatment effect at different times, and then a reduced of ~ time.

"I noticed that fold change is small between treat and control"

Read the section in ?results which talks about results tables for the likelihood ratio test. Briefly, there is a single likelihood ratio statistic and p-value, but there are many possible log fold changes to show. The results table prints the last one by default, and you need to use the name argument to pull out others. See resultsNames(dds) for all of the log fold changes fit by the DESeq() function.

I don't typically like to focus on the total number of genes with a small FDR; the important thing is to build the correct set for your question of interest.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Thanks. The fold change I talked about is the largest fold change in any time points for each genes, I did the calculation myself. About the second question, i.e., the gene expression correlation among time points, how do you think about it?

ADD REPLY • link 10.5 years ago shao ▴ 100