Dear DESeq2 developer,
Thank your very much for this great tool! I have a design that is similar to the one mentioned in deseq2 user guide "Group-specific condition effects, individuals nested within groups" http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups.
The colData is shown below. Basically, this randomized clinical trial has three treatment groups, patients are nested within each group, while each patient has two measurements: pre and post treatment (time).
treatment patient.n time
1 Placebo 1 pre
2 Placebo 1 post
3 Placebo 2 pre
4 Placebo 2 post
5 Placebo 3 pre
6 Placebo 3 post
7 LD_Drug 1 pre
8 LD_Drug 1 post
9 LD_Drug 2 pre
10 LD_Drug 2 post
11 LD_Drug 3 pre
12 LD_Drug 3 post
13 HD_Drug 1 pre
14 HD_Drug 1 post
15 HD_Drug 2 pre
16 HD_Drug 2 post
17 HD_Drug 3 pre
18 HD_Drug 3 post
19 HD_Drug 4 pre
20 HD_Drug 4 post
My DESeq2 workflow is as following:
dds = DESeqDataSetFromMatrix(countData=count, colData=colinfo, design=~patient.n+time)
dds <- dds[rowSums(counts(dds)) >= 10,]
mm = model.matrix(~treatment + treatment:patient.n + treatment:time, colinfo)
mm <- mm[,colSums(mm) > 0]
dds = DESeq(dds, full = mm, betaPrior = F)
> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2"
"treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost" "treatmentHD_Drug.timepost"
Regarding this design, the comparisons that I am interested are:
1. time effect (post vs pre) within each treatment group
I think results(dds, name="treatmentPlacebo.timepost"), results(dds, name="treatmentLD_Drug.timepost") and results(dds, name="treatmentHD_Drug.timepost") will extract time effect results for Placebo, LD_Drug and HD_Drug respectively. If this way is correct, when I compare to a simple design(~patient.n+time) for each group separately, the results are different, I have more significant genes in the simple design. Can you explain why and which way do you think is better?
2. more importantly, I am interested in the different post effect between treatment groups while adjusting for baseline (pre) level, which is similar to either of the ANCOVA models: post ~ treatment + pre, post-pre ~ treatment + pre, (post-pre)/pre ~ treatment + pre. I am wondering how to set up contrast to extract such information for each two treatment groups?
3. this model contains a interaction term "treatment:time", which also allows us to examine the genes that have different post vs pre time effect between treatment groups. Using results(dds, contrast=list("treatmentLD_Drug.timepost","treatmentPlacebo.timepost")), results(dds, contrast=list("treatmentHD_Drug.timepost","treatmentPlacebo.timepost")), results(dds, contrast=list("treatmentHD_Drug.timepost","LD_Drug.timepost")) will extract these genes for each two groups. Together with question #1 (not sure for question #2), all these terms can be extracted from the above model specified, do you think whether it is better to analyze all data using one model or analyze them separately like in question #1?
4. regarding main effect for treatment, with all the interaction terms, how to interpret for example "treatmentLD_Drug"? When should we add "time" main effect?
Sorry for all these questions, this model matrix is too complex and I want to make sure that I am doing the right thing.
Thank you very much for your help and I am looking forward to your response,
Wendy
Dear Dr. Love,
Thank you very much for your quick response. Now I know what to do regarding question #1, 3 , 4 with the information and suggestion you provided.
For question #2, I have tried the LRT you suggested, my workflow is as below:
mm_full = model.matrix(~treatment + treatment:patient.n + treatment:time, colinfo)
mm_full <- mm_full[,colSums(mm_full) > 0]
dds = DESeqDataSetFromMatrix(countData=count, colData=colinfo, design=mm_full)
dds <- dds[rowSums(counts(dds)) >= 10,]
mm_reduced <- model.matrix(~treatment + treatment:patient.n + time, colinfo)
mm_reduced <- mm_reduced[,colSums(mm_reduced) > 0]
dds <- DESeq(dds, test="LRT", reduced=mm_reduced)
> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2" "treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost"
"treatmentHD_Drug.timepost"
The result columns here are the same as the full model output listed in the original question. Like you said, when controlling for the patient effects, we are not supposed to interpret the main effects. So I have following questions:
1. how to interpret for example "treatmentLD_Drug.timepost" here?
2. as my interest is to extract pairwise post treatment effect while controlling for baseline level (pre), then how do I set up the contrast to extract such information based on this LRT output?
Thank you very much as always, I am looking forward to your response,
Wendy
That coefficient has the same interpretation as before, "treatmentLD_Drug.timepost" is the post vs pre effect for the LD treated patients.
If you want to do pairwise comparisons, we only have Wald tests for this. The LRT is a different type of test. See the LRT section in the vignette for more details.
Hi Dr. Love,
Thank you very much! I guess LRT couldn't answer my question. Sorry to bother you again. But considering a simpler pre-post treatment design, and I want to implement a ANCOVA model like "post ~ treatment + pre" into DESeq2 work frame, what is the best way to do it ? Because I know DESeq2 doesn't allow different covariate values for each gene which would be "pre" in this case.
treatment patient.n time
1 Placebo 1 pre
2 Placebo 1 post
3 Placebo 2 pre
4 Placebo 2 post
5 Placebo 3 pre
6 Placebo 3 post
7 LD_Drug 1 pre
8 LD_Drug 1 post
9 LD_Drug 2 pre
10 LD_Drug 2 post
11 LD_Drug 3 pre
12 LD_Drug 3 post
Thank you very much!
I don't follow your question, and I think we've gotten past the point where there is a confusion about how to use the software, and now I think we are more discussing whether to try various statistical approaches. Unfortunately I have limited time to answer questions on the support site, and have to prioritize software related questions, so I'd recommend you try to work with a statistician who can guide you to what kind of statistical models to use. Note that DESeq2 uses the same coefficients as base R, as we leverage base R's model.matrix and design formula. So any statistician familiar with R and linear models may be useful to consult with.
Thanks, Dr. Love. I may not be clear, I will consult a statistician with alternative statistical approach.
Regarding the "pairwise comparisons with Wald tests " you mentioned earlier, do I have to perform a separate model or I can extract results using contrast from full model? If latter, can you please specify? Thanks!
> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2" "treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost" "treatmentHD_Drug.timepost"
How to do this: see (3) in your original post.
Thanks, Dr. Love. I think 3) is pairwise compare the time effect, but what I want is pairwise compare post treatment effect (with / without adjust baseline level).
Sorry, I understand you are very busy and I may be confused at some point. I really appreciate your time and help!
"3) Yes, you can pairwise compare the time effect like that. Unlike in (1), you do not have a choice, here you have to put all the samples together to perform these comparisons. Another reason to use this approach."
You can not compare the post treatment level alone with this design, where you control for patient baselines, only the post vs pre effect.
Understanding why is a bit complex and you need to have more background perhaps in linear modeling. I strongly recommend consulting with a statistician if these coefficients and designs are confusing or not clear.
Thank you very much for all your answers, Dr. Love. They are really helpful!