Ribo-seq/RNA-seq analysis -- DESeq2 multi-factor design
1
1
Entering edit mode
kotcha.m ▴ 10
@kotcham-16778
Last seen 3.8 years ago

Dear Bioconductor support community,

I have ribosome profiling/RNA-seq data that I want to analyze using DESeq2, but I am having some trouble with the design and how to compare selected samples. I'd really appreciate your help!

I have 4 different conditions: 2 strains grown at 2 temperatures, each condition has ribo-seq and RNA-seq data, and two replicates for each. From what I've read so far on this site, it seems like it's more sensible for me to keep the strain and temperature combined as one condition, instead of having them in separate columns? Here is what I have:

> sample_type
         condition type
MA25_RPF       M25  RPF
MA37_RPF       M37  RPF
MB25_RPF       M25  RPF
MB37_RPF       M37  RPF
WA25_RPF       W25  RPF
WA37_RPF       W37  RPF
WB25_RPF       W25  RPF
WB37_RPF       W37  RPF
MA25_RNA       M25  RNA
MA37_RNA       M37  RNA
MB25_RNA       M25  RNA
MB37_RNA       M37  RNA
WA25_RNA       W25  RNA
WA37_RNA       W37  RNA
WB25_RNA       W25  RNA
WB37_RNA       W37  RNA
> all.ds <- DESeqDataSetFromMatrix(countData = count_table, 
                                   colData = sample_type, 
                                   design = ~type + condition + type:condition)​
> all.DGE <- DESeq(all.ds)​

I'd like to compare Mutant vs WT for each temperature, and also 25C vs 37C for each strain, after normalization of ribo- to RNA-seq. I set "W25" as the reference. Is this design ok for what I want to do?

If this design makes sense, then I have a follow-up question. From what I understand of the results based on the resultsNames (please correct me if I'm wrong), "typeRPF.conditionM25" would mean comparing RPF/RNA of M25 to RPF/RNA of W25. But for "condition_M25_vs_W25", it does not include the interaction term and is only comparing the RPF. If I want to compare the RPF/RNA of one sample to another sample that is not the reference, I would need to use list() in contrast, is that right? I'm not sure which of these terms I should use to go into list() for, say, if I want to compare RPF/RNA of M37 to RPF/RNA of W37?

> resultsNames(all.DGE)
[1] "Intercept"            "type_RPF_vs_RNA"      "condition_W37_vs_W25" "condition_M25_vs_W25" "condition_M37_vs_W25"
[6] "typeRPF.conditionW37" "typeRPF.conditionM25" "typeRPF.conditionM37"

Thank you in advance for your help!

Kotcha

Session info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.1          BiocParallel_1.14.1        
 [5] matrixStats_0.53.1          Biobase_2.40.0              GenomicRanges_1.32.3        GenomeInfoDb_1.16.0        
 [9] IRanges_2.14.10             S4Vectors_0.18.3            BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
 [1] genefilter_1.62.0      locfit_1.5-9.1         splines_3.5.0          lattice_0.20-35        colorspace_1.3-2      
 [6] htmltools_0.3.6        base64enc_0.1-3        blob_1.1.1             survival_2.42-4        XML_3.98-1.11         
[11] rlang_0.2.1            pillar_1.2.3           DBI_1.0.0              foreign_0.8-70         bit64_0.9-7           
[16] RColorBrewer_1.1-2     GenomeInfoDbData_1.1.0 plyr_1.8.4             stringr_1.3.1          zlibbioc_1.26.0       
[21] munsell_0.5.0          gtable_0.2.0           htmlwidgets_1.2        memoise_1.1.0          latticeExtra_0.6-28   
[26] knitr_1.20             geneplotter_1.58.0     AnnotationDbi_1.42.1   htmlTable_1.12         Rcpp_0.12.17          
[31] acepack_1.4.1          xtable_1.8-2           scales_0.5.0           backports_1.1.2        checkmate_1.8.5       
[36] Hmisc_4.1-1            annotate_1.58.0        XVector_0.20.0         bit_1.1-14             gridExtra_2.3         
[41] ggplot2_2.2.1          digest_0.6.15          stringi_1.2.3          grid_3.5.0             tools_3.5.0           
[46] bitops_1.0-6           magrittr_1.5           RSQLite_2.1.1          lazyeval_0.2.1         RCurl_1.95-4.10       
[51] tibble_1.4.2           Formula_1.2-3          cluster_2.0.7-1        Matrix_1.2-14          data.table_1.11.4     
[56] rstudioapi_0.7         rpart_4.1-13           nnet_7.3-12            compiler_3.5.0 
deseq2 sequencing ribosome profiling riboseq multiple factor design • 2.4k views
ADD COMMENT
0
Entering edit mode

Update:

I just found another post that addressed almost the same questions: DESeq2 for Ribosomal profiling analysis [two-factor designs]

It seems that I got the design backwards? So, I changed it to design = ~condition + condition:type. (Side question, is ~condition + condition:type the same as ~condition + type + condition:type?)

And the results become:

> resultsNames(all.DGE)
[1] "Intercept"            "condition_M25_vs_W25" "condition_M37_vs_W25" "condition_W37_vs_W25" "conditionW25.typeRPF"
[6] "conditionM25.typeRPF" "conditionM37.typeRPF" "conditionW37.typeRPF"

At this point, can I directly use these terms in contrast to compare them? For example, to compare M37 to W37:

> results(all.DGE, contrast = list("conditionM37.typeRPF", "conditionW37.typeRPF"))

Thank you!

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

Looks like you found the answer. 

This is a common question (but I don't know if its in the documentation actually):

Is ~condition + condition:type the same as ~condition + type + condition:type

Yes these are equivalent in that they have the same number of coefficients and the coefficients from one can be calculated from the other. In the first design, you have baseline for the reference condition and the difference between the conditions, and then a different type effect for each condition. The second design, you have baseline for the reference condition and the difference between the conditions, a type effect for the reference level of condition, and the difference in the type effect between the conditions. Loosely, with respect to the type effect, the first design gives you A and B, while the second design gives you A and (B - A). The first design is actually easier for most users for computing results tables with DESeq2, but the second is commonly seen in interaction models in R.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you for your explanation. Let me see if I get this correctly. 

design_1 = ~condition + condition:type
> resultsNames(all.DGE_1)
[1] "Intercept"            "condition_M25_vs_W25" "condition_M37_vs_W25" "condition_W37_vs_W25" "conditionW25.typeRPF"
[6] "conditionM25.typeRPF" "conditionM37.typeRPF" "conditionW37.typeRPF"

design_2 = ~condition + type + condition:type
> resultsNames(all.DGE_2)
[1] "Intercept"            "condition_M25_vs_W25" "condition_M37_vs_W25" "condition_W37_vs_W25" "type_RPF_vs_RNA"    
[6] "conditionM25.typeRPF" "conditionM37.typeRPF" "conditionW37.typeRPF"​

In design_1, the term condition(sample).typeRPF is only RPF/RNA of that sample. But in design_2, it means RPF/RNA of that sample compared against RPF/RNA of W25, which is set as reference. So, results(all.DGE_1, contrast = list("conditionW37.typeRPF", "conditionW25.typeRPF") using design_1 is the same as results(all.DGE_2, name = "conditionW37.typeRPF") using design_2? And if I do contrast for other samples using design_2, the denominator RPF/RNA of W25 would cancel out, so it's essentially the same as design_1 in that case?

ADD REPLY
0
Entering edit mode

Is there a copy paste mistake? There should be "conditionW25.typeRPF" in the first resultsNames.

ADD REPLY
0
Entering edit mode

For design_1? It is there in the resultsNames. But in the text when I wrote out the list(), I made a mistake and dropped "type". Sorry about that! I'll edit it.

ADD REPLY
0
Entering edit mode

Sorry, my confusion, I didn't see the horizontal scroll bar.

Yes to all your questions, you've got it.

ADD REPLY
0
Entering edit mode

Alright. Thank you so much for your help!

ADD REPLY
0
Entering edit mode

Hi Michael,

I have another question. I want to compare the expression of M37 to M25, but both samples are first normalized to their WT counterparts. In other words, it'd be (M37 RPF/RNA)/(W37 RPF/RNA) / (M25 RPF/RNA)/(W25 RPF/RNA). Can I directly do this from design_1 like this?:

results(all.DGE_1, contrast = list(c("conditionM37.typeRPF", "conditionW37.typeRPF"), 
                                   c("conditionM25.typeRPF", "conditionW25.typeRPF")))

It didn't give me an error, but because contrast = list(a, b) and contrast = list(c(a, b)) usually mean different things, I feel like using c() to group them is not right and I might have to redo the design?

Thank you for your patience!

Kotcha

ADD REPLY
0
Entering edit mode

You can do this, but you need to re-arrange the terms so it's a simple numerator and denominator:

(M37/W37) / (M25/W25) = (M37 * W25) / (W37 * M25)

So the contrast should be:

list(c("conditionM37.typeRPF", "conditionW25.typeRPF"), # numerator
     c("conditionW37.typeRPF", "conditionM25.typeRPF")) # denominator

When using results() with a list-style contrast, the first element of the list is the numerator of the contrast, the second element is the denominator.

ADD REPLY
0
Entering edit mode

I see. Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6