Question

deseq2 multifactor timeseries help

1

Entering edit mode

sukhi17 ▴ 10

@sukhi17-12965

Last seen 7.7 years ago

Hi,

I have been trying to learn deseq2 and managed to work on a two factorial design (followed a similar post DESeq2 Model Design and worked through my analysis, thanks to Michael). I'm now into a three factor time-series design and would need some help please in designing my model and describing contrasts. I have been looking into lots of forum questions but still confused, have also read the vignette.

Here is my design: Group: A and B, Treatment: control and treated, time: 0,1,2,4 - 3 replicates each

group treatment timepoints
1 A control T0
2 A control T0
3 A control T0
4 B control T0
5 B control T0
6 B control T0
7 A control T1
8 A control T1
9 A control T1
10 B control T1
11 B control T1
12 B control T1
13 A treated T1
14 A treated T1
15 A treated T1
16 B treated T1
17 B treated T1
18 B treated T1
19 A control T2
20 A control T2
21 A control T2
22 B control T2
23 B control T2
24 B control T2
25 A treated T2
26 A treated T2
27 A treated T2
28 B treated T2
29 B treated T2
30 B treated T2
31 A control T4
32 A control T4
33 A control T4
34 B control T4
35 B control T4
36 B control T4
37 A treated T4
38 A treated T4
39 A treated T4
40 B treated T4
41 B treated T4
42 B treated T4

key questions for DE:

1) time-series DE within a specific group, 2) time-series DE within specific treatment 3) group vs treatment time series combinations

Different model designs tried:

~group+treatment+timepoints+treatment:group

~group+treatment+timepoints+group:timepoints

~group+group:treatment+group:timepoints - test="LRT", reduced = ~group+group:timepoints

I am having a hard time to understand what design should I use for a complete analysis (I should add in: newbie in R, so learning as I go).

> sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New Zealand.1252
[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C
[5] LC_TIME=English_New Zealand.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggplot2_2.2.0 DESeq2_1.10.1 RcppArmadillo_0.7.500.0.0
[4] Rcpp_0.12.8 SummarizedExperiment_1.0.2 Biobase_2.30.0
[7] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3 IRanges_2.4.8
[10] S4Vectors_0.8.11 BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] genefilter_1.52.1 locfit_1.5-9.1 splines_3.2.2 lattice_0.20-34
[5] colorspace_1.3-1 htmltools_0.3.5 survival_2.40-1 XML_3.98-1.5
[9] foreign_0.8-67 DBI_0.5-1 BiocParallel_1.4.3 RColorBrewer_1.1-2
[13] lambda.r_1.1.9 plyr_1.8.4 stringr_1.1.0 zlibbioc_1.16.0
[17] munsell_0.4.3 gtable_0.2.0 futile.logger_1.4.3 memoise_1.0.0
[21] latticeExtra_0.6-28 knitr_1.15.1 geneplotter_1.48.0 AnnotationDbi_1.32.3
[25] htmlTable_1.7 acepack_1.4.1 xtable_1.8-2 openssl_0.9.5
[29] scales_0.4.1 base64_2.0 Hmisc_4.0-1 annotate_1.48.0
[33] XVector_0.10.0 gridExtra_2.2.1 digest_0.6.10 stringi_1.1.2
[37] grid_3.2.2 tools_3.2.2 magrittr_1.5 lazyeval_0.2.0
[41] tibble_1.2 RSQLite_1.1 Formula_1.2-1 cluster_2.0.5
[45] futile.options_1.0.0 Matrix_1.2-7.1 data.table_1.10.0 assertthat_0.1
[49] rpart_4.1-10 nnet_7.3-12

Thanks everyone for your time!!!

deseq2 multifactorial design time course • 3.6k views

ADD COMMENT • link updated 7.8 years ago by ana • 0 • written 7.8 years ago by sukhi17 ▴ 10

0

Entering edit mode

Dear Michael,

I'm analysing a RNA-seq dataset with a very similar design to the one described by Sukhi, except that I do not have control and infected samples at time 0 (only have control and infected samples at T1, T2, T3).

The approach I took was combining all 3 factors (Group, Treatment, Time) into a single factor and using design=~ Donor + Group_Treatment_Time. Would this be a good strategy?

I'm interested in time-specific differences:
i) between control and infection for each group
ii) between groups for control (or infection)

and I'm interested in finding genes with a difference in baseline expression (a main effect), ie lines moving in parallel:
iii) between control and infection for each group
iv) between groups for control (or infection)

Thank you for your time.

ADD REPLY • link 7.8 years ago ana • 0

0

Entering edit mode

I'd recommend you partner with a local statistician as well. You have donor, group, treatment and time variables, and there are many ways to do the modeling. Note that it's not DESeq2 specific, any modeling you can do with linear models in R you can use DESeq2 to do.

ADD REPLY • link 7.8 years ago Michael Love 43k

score 1 · Answer 1 · 2017-05-05

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 32 minutes ago

United States

hi Sukhi,

You can take a look at the time series example in the workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/#time

Supposing you didn't have the "group" variable, a difference from that time series example and your data is that you have no treated samples at time 0. So you cannot estimate a "treatment" effect (this is the difference at time 0 between treated and control). Instead you can create a model matrix with just the interaction terms but no "treatment" term:

mm <- model.matrix(~timepoint + treatment:timepoint)

Then you will have to remove the column of 0's from 'mm' because there are no treated samples at time=0. Let's call that cleaned up matrix 'mm.full'. Then, the test for differences in the treated samples across time would be to subset the interaction terms out of 'mm.full', and provide the smaller matrix to the 'reduced' argument. If we call that smaller matrix 'mm.reduced' it would look like:

dds <- DESeq(dds, full=mm.full, reduced=mm.reduced, test="LRT")
res <- results(dds)

Things are complicated by having the "group" variable. In fact there are a couple of possible designs here, and therefore I would recommend you partner with a local statistician who can help you formulate specific designs and contrasts, and interpret results.

ADD COMMENT • link 7.8 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

I have relabeled the samples as group_treatment instead of keeping them as two separate factors (group and treatment separately) to answer important biological questions.

> sampleTable
condition timepoints
1 control_A T0
2 control_A T0
3 control_A T0
4 control_B T0
5 control_B T0
6 control_B T0
7 control_A T1
8 control_A T1
9 control_A T1
10 control_B T1
11 control_B T1
12 control_B T1
13 treated_A T1
14 treated_A T1
15 treated_A T1
16 treated_B T1
17 treated_B T1
18 treated_B T1
19 control_A T2
20 control_A T2
21 control_A T2
22 control_B T2
23 control_B T2
24 control_B T2
25 treated_A T2
26 treated_A T2
27 treated_A T2
28 treated_B T2
29 treated_B T2
30 treated_B T2
31 control_A T4
32 control_A T4
33 control_A T4
34 control_B T4
35 control_B T4
36 control_B T4
37 treated_A T4
38 treated_A T4
39 treated_A T4
40 treated_B T4
41 treated_B T4
42 treated_B T4

Because of missing samples/timepoints, I have removed the column of 0's as suggested and here is the code below:

mm <- model.matrix(~timepoints+condition:timepoints)

all.zero <- apply(mm, 2, function(x) all(x == 0))
mm.full <- mm[, !all.zero]

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = sampleTable, design = ~ timepoints+condition)

mm.reduced <- model.matrix(~timepoints+condition)

ddsTC <- DESeq(dds, full = mm.full, reduced = mm.reduced, test="LRT")

> resultsNames(ddsTC)
[1] "Intercept" "timepointsT1"
[3] "timepointsT2" "timepointsT4"
[5] "timepointsT0.conditioncontrol_B" "timepointsT1.conditioncontrol_B"
[7] "timepointsT2.conditioncontrol_B" "timepointsT4.conditioncontrol_B"
[9] "timepointsT1.conditiontreated_A" "timepointsT2.conditiontreated_A"
[11] "timepointsT4.conditiontreated_A" "timepointsT1.conditiontreated_B"
[13] "timepointsT2.conditiontreated_B" "timepointsT4.conditiontreated_B"

Does this look OK? I am particularly interested in what design to specify at the line

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = sampleTable, design = ~ timepoints+condition).

For looking at effect in control_B sample between T1 and T0:

results(ddsTC, contrast=list(c("timepointsT1.conditioncontrol_B","timepointsT0.conditioncontrol_B")), test="Wald")

How do I create a contrast (LRT) to look at differences between control_B vs control_A samples at T0 given that timepointsT0 and timepontsT0.conditioncontrol_A are used as base levels.

Please let me know if I am on the right track. Thanks heaps for looking at my queries.

ADD REPLY • link 7.7 years ago sukhi17 ▴ 10

0

Entering edit mode

This design doesn't work because the samples at time 0 for treatment B are not being used for comparisons of the later time points. I'm sorry but it's not a simple software question how to approach this analysis, and there are a number of statistical choices to make. I'm going to stick with my recommendation that you approach a biostatistician to work with on this analysis.

ADD REPLY • link 7.7 years ago Michael Love 43k