DESeq2 interaction terms in 2x2x2 factorial design
2
0
Entering edit mode
Marie • 0
@9405877e
Last seen 2.0 years ago
Germany

Hello, I'm trying to analyze RNASeq data from an multifactorial experimental setup with DESeq2. In brief, the experiment is about multiple stressor effects in stream organisms, whereby different stressors (here: added sediment, increased salinity or reduced flow velocity) are applied to the study organisms, either as single stressors or in each possible combination.

I would like to know, which gene expression patterns are the response to the single stressors (e.g. increased sediment load vs. control) and how these expression pattern change, if multiple stressors are present (due to interaction of stressors).

I read the vignette and tried to find help here, but with this complex design, it is still not clear to me, which coefficients/contrasts to use in order to investigate stressor interactions.

As recommended, I summarized all treatment combinations in one 'group' variable and used the following design:

dds2 <- DESeqDataSetFromMatrix(countData=round(rawCounts2), colData=sampleData2, design= ~ group)

with my samplefile looking like this:

Sample  group
5   control
14  control
28  control
31  control
3   F
8   F
17  F
18  F
7   F_Sal
13  F_Sal
23  F_Sal
32  F_Sal
10  F_Sal_Sed
11  F_Sal_Sed
26  F_Sal_Sed
29  F_Sal_Sed
2   F_Sed
12  F_Sed
24  F_Sed
30  F_Sed
16  Sal
19  Sal
25  Sal
33  Sal
4   Sal_Sed
20  Sal_Sed
22  Sal_Sed
43  Sal_Sed
15  Sed
21  Sed
27  Sed
 ...

and these are my result names:

> resultsNames(dds2)
[1] "Intercept"                  "group_F_vs_control"         "group_F_Sal_vs_control"    
[4] "group_F_Sal_Sed_vs_control" "group_F_Sed_vs_control"     "group_Sal_vs_control"      
[7] "group_Sal_Sed_vs_control"   "group_Sed_vs_control"

I understood that for my single effects (e.g. gene expression due to sediment addition), I just have to compare Sed vs control, like so:

sed <- results(dds2, contrast=c("group", "Sed", "control"))

The tricky part comes, when I aim to understand how the stressors interact.

If I would like to know, how the gene expression in response to sediment addition is shaped also by another stressor (let's say increased salinity), which would be the correct comparison for that?

Would it be

results(dds2, contrast = c("group", "Sal_Sed", "Sed"))

and is this equivalent to

results(dds2, contrast = list(c("group_Sal_Sed_vs_control"), c("group_Sed_vs_control")))

?

Or am I completely wrong and I would have to combine the single effects for sediment and salinity (i.e. weighing the coefficients) and compare that to the interaction term?

I also tried the 'normal' interaction term design

 dds1 <- DESeqDataSetFromMatrix(countData=round(rawCounts), colData=sampleData, design= ~ Salinity + Sediment + Flow.Velocity + Salinity:Sediment + Salinity:Flow.Velocity + Sediment:Flow.Velocity + Salinity:Sediment:Flow.Velocity)

with 3 factors (each has 2 levels) instead of one grouping factor.

> sampleData
    Salinity Sediment Flow.Velocity
5    ambient  natural        normal
14   ambient  natural        normal
28   ambient  natural        normal
31   ambient  natural        normal
40   ambient  natural       reduced
41   ambient  natural       reduced
55   ambient  natural       reduced
57   ambient  natural       reduced
 ...

And I extracted the single effects e.g. like this:

sed <- results(DE_analysis, contrast=c("Sediment", "added", "natural"))

I expected, that at least the single stressor effects would be the same, but this is not the case. Did I misunderstood that the factor grouping should have resulted in the very same results or is it just wrong what I did?

Any help is highly appreciated! Thanks in advance

sessionInfo( )

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.32.0               SummarizedExperiment_1.22.0 Biobase_2.52.0             
 [4] MatrixGenerics_1.4.3        matrixStats_0.61.0          GenomicRanges_1.44.0       
 [7] GenomeInfoDb_1.28.4         IRanges_2.26.0              S4Vectors_0.30.2           
[10] BiocGenerics_0.38.0         ggplot2_3.3.5              

loaded via a namespace (and not attached):
 [1] KEGGREST_1.32.0        genefilter_1.74.1      locfit_1.5-9.4         splines_4.1.1         
 [5] lattice_0.20-44        colorspace_2.0-2       vctrs_0.3.8            utf8_1.2.2            
 [9] blob_1.2.2             XML_3.99-0.8           survival_3.2-11        rlang_0.4.11          
[13] pillar_1.6.4           withr_2.4.2            glue_1.4.2             DBI_1.1.1             
[17] BiocParallel_1.26.2    bit64_4.0.5            RColorBrewer_1.1-2     GenomeInfoDbData_1.2.6
[21] lifecycle_1.0.1        zlibbioc_1.38.0        Biostrings_2.60.2      munsell_0.5.0         
[25] gtable_0.3.0           memoise_2.0.0          geneplotter_1.70.0     fastmap_1.1.0         
[29] fansi_0.5.0            AnnotationDbi_1.54.1   Rcpp_1.0.7             xtable_1.8-4          
[33] scales_1.1.1           cachem_1.0.6           DelayedArray_0.18.0    annotate_1.70.0       
[37] XVector_0.32.0         bit_4.0.4              png_0.1-7              grid_4.1.1            
[41] tools_4.1.1            bitops_1.0-7           magrittr_2.0.1         RCurl_1.98-1.5        
[45] RSQLite_2.2.8          tibble_3.1.5           pkgconfig_2.0.3        crayon_1.4.1          
[49] ellipsis_0.3.2         Matrix_1.3-4           httr_1.4.2             R6_2.5.1              
[53] compiler_4.1.1
Interactions design Multifactorial DESeq2 • 1.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

For a consultation on the right statistical analysis and design I recommend collaborating with a local statistician or someone familiar with linear models in R. I have to limit my time on the support site for software related questions.

ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 20 hours ago
San Diego

If you want to compare two subset of samples to each other, just contrast them to each other. Adding in the variance of a third control type of sample is not going to help.

Did you do a PCA to see if all three variables are all strongly affecting your results? It might be that one of them is slight enough that it won't have much interaction with anything.

Note that doing contrasts with interactions terms will actually change the samples included in your contrasts. Doing contrast("Genotype", "I", "II") means something very different if the design is ~Genotype + Treatment versus ~ Genotype + Treatment + Genotype:Treatment.

ADD COMMENT

Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6