Question

DESeq2 Formula Design with two variables and individual normalization

0

Entering edit mode

polyptwo • 0

@polyptwo-19338

Last seen 5.3 years ago

Formula design

We are struggling in understanding the correct design formula for my rnaseq analysis. We have 16 samples, 4 biological replicates with 2 different variables: if they are treated with a drug or not, and if they are stressed or not.

We want to know: a) Which is the effect of the treatment under basal conditions. b) Which is the effect of the treatment under stressed conditions.

During the exploratory analysis, we realized that the individual differences between biological replicates were bigger than than the ones due to treatment, or infection (i.e. the samples cluster together by biological replicate and not by treatment). We are not interested in quantifying these differences, but on normalizing for them (kinda like a paired analysis?). My question is the following:

Given that the 'genotype' (or 'family') effect will be inherent in all the samples (see colData below), should I take it into account when doing the design matrix?

i.e., my colData would look like this:

   genotype treatment    stress
1         A untreated  stressed
2         A   treated  stressed
3         A untreated        No
4         A   treated        No
5         B untreated  stressed
6         B   treated  stressed
7         B untreated        No
8         B   treated        No
9         C untreated  stressed
10        C   treated  stressed
11        C untreated        No
12        C   treated        No
13        D untreated  stressed
14        D   treated  stressed
15        D untreated        No
16        D   treated        No

If I'm understanding previous answers right, I should normalize by 'genotype', and the technical process would be similar of dealing with having a 'batch' effect, being every genotype a 'batch'. In this case, should our design look like this?

design = ~ treatment + treatment:genotype + treatment:stress

(This would be similar to this case: https://support.bioconductor.org/p/92108/)

Or should I collapse 'treatment' and 'stress' into a grouped 'condition' variable as suggested here (https://support.bioconductor.org/p/67600/#67612), and input the design:

design = ~ genotype + condition

I would then take out the results that we are interested with

## Which is the effect of the treatment under basal conditions.
results(dds, contrast=c("condition","treatedNo", "untreatedNo")) 

## Which is the effect of the treatment under stressed conditions.
results(dds, contrast=c("condition","treatedstressed", "untreatedstressed"))

But I am not able to see how I would see the "extra" effect of the treatment under stressed conditions 'relative' to being stressed in this case.

Thank you so much for your help

deseq2 • 5.6k views

ADD COMMENT • link updated 6.1 years ago by Michael Love 43k • written 6.1 years ago by polyptwo • 0

score 2 · Accepted Answer · 2019-01-11

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

If you additionally want to see the "extra" effect, that is often called an interaction. You can use the design ~genotype + treatment + treatment:stress, where you will have two stress effects which you can pull out individually with name or contrast with contrast=list(..., ...).

ADD COMMENT • link 6.1 years ago Michael Love 43k

0

Entering edit mode

Thanks for your response Michael.

However, the design ~genotype + stress:treatment is not possible as the model matrix is not full rank . I guess that I am not fully understanding the linearity here because with my colData (in the original post) I cannot relate to any of the examples on the DESeq2 vignette. There is one copy of each 'genotype' in each group, and they are not nested within groups.

The designs ~genotype + stress + treatment + stress:treatment, or ~genotype + treatment + stress:treatment work, though. Maybe one of those was the one that you meant?

Thank you so much for your help, and have a nice weekend,

PS: I am running

ddsTxi <- DESeqDataSetFromTximport(txi,
                                    colData = coldata,
                                    design = ~ genotype + stress:treatment)

and my sessionInfo() is:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.10.1             ggplot2_3.1.0               gplots_3.0.1               
 [4] RColorBrewer_1.1-2          DESeq2_1.22.2               EDASeq_2.16.2              
 [7] ShortRead_1.40.0            GenomicAlignments_1.18.1    SummarizedExperiment_1.12.0
[10] DelayedArray_0.8.0          matrixStats_0.54.0          Rsamtools_1.34.0           
[13] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         Biostrings_2.50.2          
[16] XVector_0.22.0              BiocParallel_1.16.5         geneplotter_1.60.0         
[19] annotate_1.60.0             XML_3.98-1.16               AnnotationDbi_1.44.0       
[22] IRanges_2.16.0              S4Vectors_0.20.1            lattice_0.20-38            
[25] Biobase_2.42.0              BiocGenerics_0.28.0         biomaRt_2.38.0             

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            progress_1.2.0        
 [4] httr_1.4.0             tools_3.5.2            backports_1.1.3       
 [7] R6_2.3.0               KernSmooth_2.23-15     rpart_4.1-13          
[10] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1        
[13] colorspace_1.3-2       nnet_7.3-12            withr_2.1.2           
[16] tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2     
[19] bit_1.1-14             compiler_3.5.2         htmlTable_1.13.1      
[22] labeling_0.3           rtracklayer_1.42.1     caTools_1.17.1.1      
[25] scales_1.0.0           checkmate_1.9.0        readr_1.3.1           
[28] genefilter_1.64.0      DESeq_1.34.1           stringr_1.3.1         
[31] digest_0.6.18          foreign_0.8-71         R.utils_2.7.0         
[34] base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[37] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0      
[40] RSQLite_2.1.1          bindr_0.1.1            jsonlite_1.6          
[43] hwriter_1.3.2          gtools_3.8.1           acepack_1.4.1         
[46] dplyr_0.7.8            R.oo_1.22.0            RCurl_1.95-4.11       
[49] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3         
[52] Matrix_1.2-15          Rcpp_1.0.0             munsell_0.5.0         
[55] R.methodsS3_1.7.1      stringi_1.2.4          yaml_2.2.0            
[58] zlibbioc_1.28.0        plyr_1.8.4             grid_3.5.2            
[61] blob_1.1.1             gdata_2.18.0           crayon_1.3.4          
[64] splines_3.5.2          GenomicFeatures_1.34.1 hms_0.4.2             
[67] locfit_1.5-9.1         knitr_1.21             pillar_1.3.1          
[70] glue_1.3.0             latticeExtra_0.6-28    data.table_1.11.8     
[73] BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
[76] assertthat_0.2.0       xfun_0.4               aroma.light_3.12.0    
[79] xtable_1.8-3           survival_2.43-3        tibble_2.0.0          
[82] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1

ADD REPLY • link 6.1 years ago polyptwo • 0

1

Entering edit mode

Apologies, I had a typo above and left out the base level.

ADD REPLY • link 6.1 years ago Michael Love 43k

0

Entering edit mode

Thanks for your help!

ADD REPLY • link 6.1 years ago polyptwo • 0