Question

DESeq2 Model Design

2

Entering edit mode

andrew.j.skelton73 ▴ 370

@andrewjskelton73-7074

Last seen 11 months ago

United Kingdom

Hi,

I've done a model design and I hope someone can help out with my understanding of it!

I have an experimental setup that looks something like this:

3 Time points (0hrs, 6hrs, 12hrs)

3 Different Conditions (Treatments A, B and C)

So that makes 9 different combinations of time points and treatments, each is in triplicate. There are therefore27 Samples.

My design formula is : ~ Treatment + TimePoint + Treatment:TimePoint

My current understanding is this will give me small pValues of Treatment-specific effects over time?

I wanted to further refine this and look at a specific treatment and how it differs between two time points. So I used the following line:

foo <- list(c(“TreatmentA.TimePoint12hrs"), c(“TreatmentA.TimePoint0hrs"))

resMFType <- results(dds, contrast=foo)

EDIT::

as.data.frame(colData(dds))

    TimePoint Treatment
X1        0hr         A
X2        0hr         B
X3        0hr         C
X4        0hr         A
X5        0hr         B
X6        0hr         C
X7        0hr         A
X8        0hr         B
X9        0hr         C
X10       6hr         A
X11       6hr         B
X12       6hr         C
X13       6hr         A
X14       6hr         B
X15       6hr         C
X16       6hr         A
X17       6hr         B
X18       6hr         C
X19      12hr         A
X20      12hr         B
X21      12hr         C
X22      12hr         A
X23      12hr         B
X24      12hr         C
X25      12hr         A
X26      12hr         B
X27      12hr         C

sessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.6.2            RcppArmadillo_0.4.500.0 Rcpp_0.11.3             GenomicRanges_1.18.3   
[5] GenomeInfoDb_1.2.3      IRanges_2.0.0           S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2     
 [5] BatchJobs_1.5        BBmisc_1.8           Biobase_2.26.0       BiocParallel_1.0.0  
 [9] brew_1.0-6           checkmate_1.5.0      cluster_1.15.3       codetools_0.2-9     
[13] colorspace_1.2-4     DBI_0.3.1            digest_0.6.4         fail_1.2            
[17] foreach_1.4.2        foreign_0.8-61       Formula_1.1-2        genefilter_1.48.1   
[21] geneplotter_1.44.0   ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2        
[25] Hmisc_3.14-5         iterators_1.0.7      lattice_0.20-29      latticeExtra_0.6-26 
[29] locfit_1.5-9.1       MASS_7.3-35          munsell_0.4.2        nnet_7.3-8          
[33] plyr_1.8.1           proto_0.3-10         RColorBrewer_1.0-5   reshape2_1.4        
[37] rpart_4.1-8          RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1     
[41] splines_3.1.2        stringr_0.6.2        survival_2.37-7      tools_3.1.2         
[45] XML_3.98-1.1         xtable_1.7-4         XVector_0.6.0

Is this correct?

Thanks,

DESeq2 linear model model • 9.7k views

ADD COMMENT • link updated 3.8 years ago by Michael Love 43k • written 10.3 years ago by andrew.j.skelton73 ▴ 370

0

Entering edit mode

hi Andrew,

Just to make sure we give the correct response, can you paste the output of as.data.frame(colData(dds))

and also paste the output of sessionInfo()

ADD REPLY • link 10.3 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, See my edit above, Thanks,

ADD REPLY • link 10.3 years ago andrew.j.skelton73 ▴ 370

score 8 · Accepted Answer · 2014-11-25

8

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

"My current understanding is this will give me small pValues of Treatment-specific effects over time?"

The way to get a single p-value for each gene for any treatment-specific effects over time is to use the likelihood ratio test:

dds = DESeq(dds, test="LRT", reduced=~Treatment + TimePoint)
results(dds)

Where the results table will give a LRT statistic and p-value. Note that there are many LFCs involved in this likelihood ratio test (all the interaction effects between treatment and time), but the software has to pick just one to print in the table. It will just print the last one based on the ordering of levels. You can change this column with the 'name' argument. But you will notice that the statistic and p-value do not change, because these are based on all the interaction terms.

"I wanted to further refine this and look at a specific treatment and how it differs between two time points. So I used the following line...list(c(“TreatmentA.TimePoint12hrs"), c(“TreatmentA.TimePoint0hrs"))"

That line is correct if you used just DESeq(dds).

If you used the LRT instead, then the interaction terms for base levels are absorbed by the intercept. This means the Treatment A comparison would be:

results(dds, name="TimePoint_12hr_vs_0hr", test="Wald")

...where specifying Wald means that we want p-values for this LFC only.

and for Treatment B this would be:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr","TreatmentB.TimePoint12hr")), test="Wald")

...which is the sum of the 12 vs 0 effect for the base level (treatment A) and the interaction effect of 12 vs 0 in treatment B.

Using DESeq(...,test="LRT") or just DESeq() are equally valid. The first is better to run if you want to test for any effect, and then make individual comparisons using results(). The second is perhaps easier syntax for comparing individual timepoints.

ADD COMMENT • link 10.3 years ago Michael Love 43k

0

Entering edit mode

That is a very comprehensive response. Thank you very much for such a detailed explanation, that helps clear things up for me!

ADD REPLY • link 10.3 years ago andrew.j.skelton73 ▴ 370

0

Entering edit mode

Hi Michael,

A follow up question - if you don't mind

What would be the best way to design a contrast (or LRT), to look at the differences between Treatment A and Treatment B at a specific time point?

i.e. TreatmentB.0hrs-TreatmentA.0hrs

thanks,

ADD REPLY • link 10.1 years ago andrew.j.skelton73 ▴ 370

2

Entering edit mode

The treatment B vs A effect at a specific time, say 12 hr, is the main effect (the treatment B vs A effect for the base level of time) and the additional effect at that specific time:

results(dds, contrast=list(c("Treatment_B_vs_A","TreatmentB.TimePoint12hr")), test="Wald")

except for time = 0, which is the base level, and so the treatment B vs A effect is just:

results(dds, name="Treatment_B_vs_A", test="Wald")

ADD REPLY • link 10.1 years ago Michael Love 43k

0

Entering edit mode

That's great, cheers Michael. Slight followup to all of this - is there a reason I can't create a contrast to look between 6hrs and 12hrs? - I can create contrasts to look at 12hrs vs 0hrs and 6hrs vs 0hrs, but not 12hrs vs 6hrs...

ADD REPLY • link 10.1 years ago andrew.j.skelton73 ▴ 370

1

Entering edit mode

If you want to contrast two non-base levels, you need to put one level in the numerator, and one level in the denominator (see the contrast argument description in ?results for more information and the Examples section of ?results).

For treatment A, you compare 12 vs 6 like so:

results(dds, contrast=list("TimePoint_12hr_vs_0hr","TimePoint_6hr_vs_0hr"), test="Wald")

For treatment B, you need to also add the interaction effect for B at time 12 and for B at time 6:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr","TreatmentB.TimePoint12hr"),
c("TimePoint_6hr_vs_0hr","TreatmentB.TimePoint6hr")), test="Wald")

ADD REPLY • link 10.1 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, yet another question if you've got time! Is there a way to explore effects crossing between both time and treatment. For example, if I wanted to see the effects of 0hr TreatmentA Vs 12hr TreatmentC

I thought it'd be something similar to:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr"),
c("TimePoint_12hr_vs_0hr","TreatmentC.TimePoint12hr")), test="Wald")

This doesn't appear to play ball, and I'm not 100% sure I'm doing the right thing. My thinking is that this should show the interaction effect of TreatmentA 0Hrs Vs TreatmentA 12Hrs Against TreatmentC 0Hrs Vs TreatmentC 12Hrs

ADD REPLY • link 9.9 years ago andrew.j.skelton73 ▴ 370

1

Entering edit mode

It doesn't make sense to have an effect in the numerator and the denominator of the fold change, as it will cancel, e.g.

A * B / A

Comparing 12hr C vs 0hr A is to simply add the log fold changes: 12hr vs 0hr, C vs A, and the interaction of C and 12hr.

So something like:

results(dds, contrast=list(c("time_12_vs_0","treat_C_vs_A","treat_C.time_12")))

Note that i switched the order from your question (you had "0hr TreatmentA Vs 12hr TreatmentC "), because the fold changes are relative to the reference levels, so it makes more sense usually to compare 12 vs 0, C vs A, etc.

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Ah that makes a lot more sense.

Out of curiosity, going back to the beginning of the experiment. If I had paired samples, what would I have to add to the model design to cover the pairings? Or would I have to change the nature of the analysis from scratch?

I've tried:

~ Pairing + Treatment + TimePoint + Treatment:TimePoint

Does that makes sense?

ADD REPLY • link 9.9 years ago andrew.j.skelton73 ▴ 370

0

Entering edit mode

yes, that's the recommended way to include pair information

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Hi Michael and Andrew, I have a question that I was puzzling. To look at differences between two different time points for a specific treatment, using the reduced model as: reduced=~Treatment + TimePoint. To look at the differences between Treatment A and Treatment B at a specific time point, does the reduced model need to change into: reduced=~ TimePoint+ Treatment? And how about the design formula, is it still : ~ Treatment + TimePoint + Treatment:TimePoint

Thank you very much for your reply.

Emily

ADD REPLY • link 9.7 years ago Emily • 0

0

Entering edit mode

Remember, the likelihood ratio test is testing *all* time points. This is not the right way to compare specific effects in DESeq2.

If you want to compare specific time points within a treatment or the treatment differences within a single time point you can use, e.g.:

results(dds, test="Wald", contrast=list("treatmentC.time6hr", "treatmentB.time6hr"))

Note that you can run this Wald test on the output of DESeq(dds, test="LRT"). So you can build and LRT results table for all time points, and then test specific comparisons using the code above. The names used in the contrast should come from resultsNames(dds)

ADD REPLY • link 9.7 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael,

So you mean that I do not need to change the full modle or reduced modle when I try to compare specific time points within a treatment or the treatment differences within a single time point, and the precedence order of the "Treatment" and "TimePoint" in the full or reduced modle will not affact the final results?

ADD REPLY • link 9.7 years ago Emily • 0

0

Entering edit mode

Yes.

ADD REPLY • link 9.7 years ago Michael Love 43k

0

Entering edit mode

Thank you very much, Michael.

Cheers!

ADD REPLY • link 9.7 years ago Emily • 0

0

Entering edit mode

Hello,

This particular thread has been incredibly useful. Thanks so much. I am slightly confused about comparing treatments. I have a dataset very similar to the coldata used at the beginning of this thread. I would like to compare my treatment groups to one another (e.g. treatment B vs. C) across time points. I would like to compare these treatments at a specific time point once both the treatment A and time 0 controls (both base levels) have been accounted for in the reference. In this thread the example for time 6hr was given as "results(dds, test="Wald", contrast=list("treatmentC.time6hr", "treatmentB.time6hr"))". Does this account for changes over both the control (treatment A) and time 0? My understanding was the LRT interaction term, treatment:timepoint accounts for differences from time 0. So, "results(dds, test="Wald", contrast=list("treatmentC.time6hr", "treatmentB.time6hr"))" might account for the difference from time 0 in each treatment, but what about difference from treatment A and time 0 at that timepoint? Which interaction terms would you use to account for both differences? I believe you would also include "treat_C_vs_A", "treat_B_vs_A" and the code would be "results(dds, test="Wald", contrast=list("treatmentC.time6hr", "treatmentB.time6hr", "treat_C_vs_A", "treat_B_vs_A))". Any clarification on this would be helpful. Thank you in advance.

ADD REPLY • link 3.8 years ago drives • 0

0

Entering edit mode

Hi,

I'm sorry but I don't have time at the moment to provide guidance on setting up the statistical analysis for user's datasets. I recommend to work with a local statistician or someone familiar with linear models in R.

ADD REPLY • link 3.8 years ago Michael Love 43k