Hello, I'm trying to analyze RNASeq data from an multifactorial experimental setup with DESeq2. In brief, the experiment is about multiple stressor effects in stream organisms, whereby different stressors (here: added sediment, increased salinity or reduced flow velocity) are applied to the study organisms, either as single stressors or in each possible combination.
I would like to know, which gene expression patterns are the response to the single stressors (e.g. increased sediment load vs. control) and how these expression pattern change, if multiple stressors are present (due to interaction of stressors).
I read the vignette and tried to find help here, but with this complex design, it is still not clear to me, which coefficients/contrasts to use in order to investigate stressor interactions.
As recommended, I summarized all treatment combinations in one 'group' variable and used the following design:
dds2 <- DESeqDataSetFromMatrix(countData=round(rawCounts2), colData=sampleData2, design= ~ group)
with my samplefile looking like this:
Sample group
5 control
14 control
28 control
31 control
3 F
8 F
17 F
18 F
7 F_Sal
13 F_Sal
23 F_Sal
32 F_Sal
10 F_Sal_Sed
11 F_Sal_Sed
26 F_Sal_Sed
29 F_Sal_Sed
2 F_Sed
12 F_Sed
24 F_Sed
30 F_Sed
16 Sal
19 Sal
25 Sal
33 Sal
4 Sal_Sed
20 Sal_Sed
22 Sal_Sed
43 Sal_Sed
15 Sed
21 Sed
27 Sed
...
and these are my result names:
> resultsNames(dds2)
[1] "Intercept" "group_F_vs_control" "group_F_Sal_vs_control"
[4] "group_F_Sal_Sed_vs_control" "group_F_Sed_vs_control" "group_Sal_vs_control"
[7] "group_Sal_Sed_vs_control" "group_Sed_vs_control"
I understood that for my single effects (e.g. gene expression due to sediment addition), I just have to compare Sed vs control, like so:
sed <- results(dds2, contrast=c("group", "Sed", "control"))
The tricky part comes, when I aim to understand how the stressors interact.
If I would like to know, how the gene expression in response to sediment addition is shaped also by another stressor (let's say increased salinity), which would be the correct comparison for that?
Would it be
results(dds2, contrast = c("group", "Sal_Sed", "Sed"))
and is this equivalent to
results(dds2, contrast = list(c("group_Sal_Sed_vs_control"), c("group_Sed_vs_control")))
?
Or am I completely wrong and I would have to combine the single effects for sediment and salinity (i.e. weighing the coefficients) and compare that to the interaction term?
I also tried the 'normal' interaction term design
dds1 <- DESeqDataSetFromMatrix(countData=round(rawCounts), colData=sampleData, design= ~ Salinity + Sediment + Flow.Velocity + Salinity:Sediment + Salinity:Flow.Velocity + Sediment:Flow.Velocity + Salinity:Sediment:Flow.Velocity)
with 3 factors (each has 2 levels) instead of one grouping factor.
> sampleData
Salinity Sediment Flow.Velocity
5 ambient natural normal
14 ambient natural normal
28 ambient natural normal
31 ambient natural normal
40 ambient natural reduced
41 ambient natural reduced
55 ambient natural reduced
57 ambient natural reduced
...
And I extracted the single effects e.g. like this:
sed <- results(DE_analysis, contrast=c("Sediment", "added", "natural"))
I expected, that at least the single stressor effects would be the same, but this is not the case. Did I misunderstood that the factor grouping should have resulted in the very same results or is it just wrong what I did?
Any help is highly appreciated! Thanks in advance
sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.32.0 SummarizedExperiment_1.22.0 Biobase_2.52.0
[4] MatrixGenerics_1.4.3 matrixStats_0.61.0 GenomicRanges_1.44.0
[7] GenomeInfoDb_1.28.4 IRanges_2.26.0 S4Vectors_0.30.2
[10] BiocGenerics_0.38.0 ggplot2_3.3.5
loaded via a namespace (and not attached):
[1] KEGGREST_1.32.0 genefilter_1.74.1 locfit_1.5-9.4 splines_4.1.1
[5] lattice_0.20-44 colorspace_2.0-2 vctrs_0.3.8 utf8_1.2.2
[9] blob_1.2.2 XML_3.99-0.8 survival_3.2-11 rlang_0.4.11
[13] pillar_1.6.4 withr_2.4.2 glue_1.4.2 DBI_1.1.1
[17] BiocParallel_1.26.2 bit64_4.0.5 RColorBrewer_1.1-2 GenomeInfoDbData_1.2.6
[21] lifecycle_1.0.1 zlibbioc_1.38.0 Biostrings_2.60.2 munsell_0.5.0
[25] gtable_0.3.0 memoise_2.0.0 geneplotter_1.70.0 fastmap_1.1.0
[29] fansi_0.5.0 AnnotationDbi_1.54.1 Rcpp_1.0.7 xtable_1.8-4
[33] scales_1.1.1 cachem_1.0.6 DelayedArray_0.18.0 annotate_1.70.0
[37] XVector_0.32.0 bit_4.0.4 png_0.1-7 grid_4.1.1
[41] tools_4.1.1 bitops_1.0-7 magrittr_2.0.1 RCurl_1.98-1.5
[45] RSQLite_2.2.8 tibble_3.1.5 pkgconfig_2.0.3 crayon_1.4.1
[49] ellipsis_0.3.2 Matrix_1.3-4 httr_1.4.2 R6_2.5.1
[53] compiler_4.1.1