Hello,
I was hoping some of you could help me with the following question about the design formula as input for DEseq2. I am analysing a knock-down experiment with of 3 biological replicates. For each experiment, samples have been taken after selection at 5 different timepoints of cells that were transduced with either control shRNA, or shRNA 1147 or shRNA 1150. The important point here is that the first sample is taken after about 5 days after the cells first started expressing the shRNA: at timepoint 24hrs, there should therefore already be an interesting difference between the samples transduced with control versus the samples transduced with the SH1147 or SH1150.
In addition, for each biological replicate of the experiment, the samples are linked: RNA is taken from the same cells 24hrs, 48hrs, up to 120 hrs. To account for the repeated measures, I have information about which run of the experiment it was (the rep column below). rep1 at 24hrs with SH1147 is derived the same cells as rep1 at 48hrs with SH1147 (but 24hrs later).However, rep1 of SH1150 are obviously different cells. So the rep-number is only within the treatment group.
I am struggling how I should incorporate these features in my design. I have been reading the Bioconductor support for RNAseq and the example analysis and some other troubleshooting I found here but I did not get a full understanding of how to incorporate those two features of my dataset into my design. I am finding that whatever I am doing at the moment, the analysis picks a baseline for each variable I include in the formula. Whereas in my experiment rep1 is not a baseline for rep 2 and 3; and 24 hrs is not a baseline for 48 or 72 hrs.
The design is thus as follows (I've edited the original because the filenames are super long):
sample_name condition timepoint rep
A SH1147 24 rep1
B SH1147 24 rep2
C SH1147 24 rep3
D SH1147 48 rep1
E SH1147 48 rep2
F SH1147 48 rep3
H SH1150 24 rep1
I SH1150 24 rep2
J SH1150 24 rep3
K SH1150 48 rep1
L SH1150 48 rep2
M SH1150 48 rep3
N SHcontrol 24 rep1
O SHcontrol 24 rep2
P SHcontrol 24 rep3
Q SHcontrol 48 rep1
R SHcontrol 48 rep2
S SHcontrol 48 rep3
So, what I would like to be able to compare in the end is the following:
- Compare, for each timepoint, SH1147 to SHcontrol and SH1150 to SHcontrol
- I would like to compare the entire expression profile across the timepoints for SH1147 versus control and SH1150 versus control.
- I would also like to investigate the differences between the two SHrnas versus the control (they target the same gene).
Based on what I have found on a range of help pages and the DEseq2 package help I have given the following design formulas a try but neither of them are giving me the comparisons I need and all of them use rep1 and timepoint24 as a baseline. So I am really missing something here. I am also not sure which formula correctly captures the linkage between my samples within my groups: because this is within groups, I am hesitant to use the first formulas where rep is a seperate variable in the design formula.
According to what I have read the minimum is for the design without the linkage is this
design = ~ timepoint + condition + timepoint:condition
reduced = ~ timepoint
Based on this example in the DEseq2 support pages I think the initial design without the timepoints is this (just looking at the linkage)
design = ~ condition + condition:rep + condition:timepoint
I have tried combining these things in various ways to get a formula that will give me the comparisons I am looking for but I am just not getting it right. Here is an overview of a number of designs I have tried (#4 would be the most direct combination of the two I imagine but also does not work):
#1------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ timepoint + condition + rep + timepoint:condition + rep:condition + rep:timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ rep + rep:timepoint)
> resultsNames(dds)
[1] "Intercept" "timepoint_48_vs_24" "timepoint_72_vs_24"
[4] "timepoint_96_vs_24" "timepoint_120_vs_24" "condition_SH1147_vs_SHcontrol378"
[7] "condition_SH1150_vs_SHcontrol378" "rep_2_vs_1" "rep_3_vs_1"
[10] "timepoint48.conditionSH1147" "timepoint72.conditionSH1147" "timepoint96.conditionSH1147"
[13] "timepoint120.conditionSH1147" "timepoint48.conditionSH1150" "timepoint72.conditionSH1150"
[16] "timepoint96.conditionSH1150" "timepoint120.conditionSH1150" "conditionSH1147.rep2"
[19] "conditionSH1150.rep2" "conditionSH1147.rep3" "conditionSH1150.rep3"
[22] "timepoint48.rep2" "timepoint72.rep2" "timepoint96.rep2"
[25] "timepoint120.rep2" "timepoint48.rep3" "timepoint72.rep3"
[28] "timepoint96.rep3" "timepoint120.rep3"
#2------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ timepoint + condition + rep + timepoint:condition + rep:condition + rep:timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ rep + timepoint + rep:timepoint)
> resultsNames(dds)
[1] "Intercept" "timepoint_48_vs_24" "timepoint_72_vs_24"
[4] "timepoint_96_vs_24" "timepoint_120_vs_24" "condition_SH1147_vs_SHcontrol378"
[7] "condition_SH1150_vs_SHcontrol378" "rep_2_vs_1" "rep_3_vs_1"
[10] "timepoint48.conditionSH1147" "timepoint72.conditionSH1147" "timepoint96.conditionSH1147"
[13] "timepoint120.conditionSH1147" "timepoint48.conditionSH1150" "timepoint72.conditionSH1150"
[16] "timepoint96.conditionSH1150" "timepoint120.conditionSH1150" "conditionSH1147.rep2"
[19] "conditionSH1150.rep2" "conditionSH1147.rep3" "conditionSH1150.rep3"
[22] "timepoint48.rep2" "timepoint72.rep2" "timepoint96.rep2"
[25] "timepoint120.rep2" "timepoint48.rep3" "timepoint72.rep3"
[28] "timepoint96.rep3" "timepoint120.rep3"
#3------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + condition:rep + condition:timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ timepoint)
> resultsNames(dds)
[1] "Intercept" "condition_SH1147_vs_SHcontrol378" "condition_SH1150_vs_SHcontrol378"
[4] "conditionSHcontrol378.rep2" "conditionSH1147.rep2" "conditionSH1150.rep2"
[7] "conditionSHcontrol378.rep3" "conditionSH1147.rep3" "conditionSH1150.rep3"
[10] "conditionSHcontrol378.timepoint48" "conditionSH1147.timepoint48" "conditionSH1150.timepoint48"
[13] "conditionSHcontrol378.timepoint72" "conditionSH1147.timepoint72" "conditionSH1150.timepoint72"
[16] "conditionSHcontrol378.timepoint96" "conditionSH1147.timepoint96" "conditionSH1150.timepoint96"
[19] "conditionSHcontrol378.timepoint120" "conditionSH1147.timepoint120" "conditionSH1150.timepoint120"
#4------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + timepoint + condition:rep + condition:timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ timepoint)
> resultsNames(dds)
[1] "Intercept" "condition_SH1147_vs_SHcontrol378" "condition_SH1150_vs_SHcontrol378"
[4] "timepoint_48_vs_24" "timepoint_72_vs_24" "timepoint_96_vs_24"
[7] "timepoint_120_vs_24" "conditionSHcontrol378.rep2" "conditionSH1147.rep2"
[10] "conditionSH1150.rep2" "conditionSHcontrol378.rep3" "conditionSH1147.rep3"
[13] "conditionSH1150.rep3" "conditionSH1147.timepoint48" "conditionSH1150.timepoint48"
[16] "conditionSH1147.timepoint72" "conditionSH1150.timepoint72" "conditionSH1147.timepoint96"
[19] "conditionSH1150.timepoint96" "conditionSH1147.timepoint120" "conditionSH1150.timepoint120"
#5------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + timepoint + condition:rep + condition:timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ condition)
> resultsNames(dds)
[1] "Intercept" "condition_SH1147_vs_SHcontrol378" "condition_SH1150_vs_SHcontrol378"
[4] "timepoint_48_vs_24" "timepoint_72_vs_24" "timepoint_96_vs_24"
[7] "timepoint_120_vs_24" "conditionSHcontrol378.rep2" "conditionSH1147.rep2"
[10] "conditionSH1150.rep2" "conditionSHcontrol378.rep3" "conditionSH1147.rep3"
[13] "conditionSH1150.rep3" "conditionSH1147.timepoint48" "conditionSH1150.timepoint48"
[16] "conditionSH1147.timepoint72" "conditionSH1150.timepoint72" "conditionSH1147.timepoint96"
[19] "conditionSH1150.timepoint96" "conditionSH1147.timepoint120" "conditionSH1150.timepoint120"
I have also considered analysing the data per condition or per timepoint but I feel like that is also not the way and also prevents me from doing everything I want to do as I described above.
It would be really great if you could help me with this.
Thanks!
Dorien
Thanks a lot for your reply and suggestion! I'll have a look at that!! Does explain why I couldn't make it work.