Question

questions regarding design that is similar to "Group-specific condition effects, individuals nested within groups"

0

Entering edit mode

wwu222 • 0

@wwu222-19179

Last seen 5.8 years ago

Dear DESeq2 developer,

Thank your very much for this great tool! I have a design that is similar to the one mentioned in deseq2 user guide "Group-specific condition effects, individuals nested within groups" http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups.

The colData is shown below. Basically, this randomized clinical trial has three treatment groups, patients are nested within each group, while each patient has two measurements: pre and post treatment (time).

treatment patient.n time
1 Placebo 1 pre
2 Placebo 1 post
3 Placebo 2 pre
4 Placebo 2 post
5 Placebo 3 pre
6 Placebo 3 post
7 LD_Drug 1 pre
8 LD_Drug 1 post
9 LD_Drug 2 pre
10 LD_Drug 2 post
11 LD_Drug 3 pre
12 LD_Drug 3 post
13 HD_Drug 1 pre
14 HD_Drug 1 post
15 HD_Drug 2 pre
16 HD_Drug 2 post
17 HD_Drug 3 pre
18 HD_Drug 3 post
19 HD_Drug 4 pre
20 HD_Drug 4 post

My DESeq2 workflow is as following:

dds = DESeqDataSetFromMatrix(countData=count, colData=colinfo, design=~patient.n+time)
dds <- dds[rowSums(counts(dds)) >= 10,]
mm = model.matrix(~treatment + treatment:patient.n + treatment:time, colinfo)
mm <- mm[,colSums(mm) > 0]
dds = DESeq(dds, full = mm, betaPrior = F)

> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2"
"treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost" "treatmentHD_Drug.timepost"

Regarding this design, the comparisons that I am interested are:

1. time effect (post vs pre) within each treatment group

I think results(dds, name="treatmentPlacebo.timepost"), results(dds, name="treatmentLD_Drug.timepost") and results(dds, name="treatmentHD_Drug.timepost") will extract time effect results for Placebo, LD_Drug and HD_Drug respectively. If this way is correct, when I compare to a simple design(~patient.n+time) for each group separately, the results are different, I have more significant genes in the simple design. Can you explain why and which way do you think is better?

2. more importantly, I am interested in the different post effect between treatment groups while adjusting for baseline (pre) level, which is similar to either of the ANCOVA models: post ~ treatment + pre, post-pre ~ treatment + pre, (post-pre)/pre ~ treatment + pre. I am wondering how to set up contrast to extract such information for each two treatment groups?

3. this model contains a interaction term "treatment:time", which also allows us to examine the genes that have different post vs pre time effect between treatment groups. Using results(dds, contrast=list("treatmentLD_Drug.timepost","treatmentPlacebo.timepost")), results(dds, contrast=list("treatmentHD_Drug.timepost","treatmentPlacebo.timepost")), results(dds, contrast=list("treatmentHD_Drug.timepost","LD_Drug.timepost")) will extract these genes for each two groups. Together with question #1 (not sure for question #2), all these terms can be extracted from the above model specified, do you think whether it is better to analyze all data using one model or analyze them separately like in question #1?

4. regarding main effect for treatment, with all the interaction terms, how to interpret for example "treatmentLD_Drug"? When should we add "time" main effect?

Sorry for all these questions, this model matrix is too complex and I want to make sure that I am doing the right thing.

Thank you very much for your help and I am looking forward to your response,

Wendy

deseq2 • 1.9k views

ADD COMMENT • link updated 6.0 years ago by Michael Love 43k • written 6.0 years ago by wwu222 • 0

score 1 · Answer 1 · 2019-01-02

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

1) Yes, those are the correct coefficients. Not sure why you have more DEG when you break into groups, but that's not a good justification to do the smaller group analysis. It's typically recommended to put the samples together and run the full analysis.

2) If you want to test for whether there are differences across treatment group, use a LRT. The full design is as you have it, and the reduced design would have "time" in place of "treatment:time"

3) Yes, you can pairwise compare the time effect like that. Unlike in (1), you do not have a choice, here you have to put all the samples together to perform these comparisons. Another reason to use this approach.

4) When controlling for the patient effects, you should not interpret the main effects. These are actually just the treatment effect for the reference patient in each group.

ADD COMMENT • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Dear Dr. Love,

Thank you very much for your quick response. Now I know what to do regarding question #1, 3 , 4 with the information and suggestion you provided.

For question #2, I have tried the LRT you suggested, my workflow is as below:

mm_full = model.matrix(~treatment + treatment:patient.n + treatment:time, colinfo)
mm_full <- mm_full[,colSums(mm_full) > 0]
dds = DESeqDataSetFromMatrix(countData=count, colData=colinfo, design=mm_full)
dds <- dds[rowSums(counts(dds)) >= 10,]

mm_reduced <- model.matrix(~treatment + treatment:patient.n + time, colinfo)
mm_reduced <- mm_reduced[,colSums(mm_reduced) > 0]
dds <- DESeq(dds, test="LRT", reduced=mm_reduced)

> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2" "treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost"
"treatmentHD_Drug.timepost"

The result columns here are the same as the full model output listed in the original question. Like you said, when controlling for the patient effects, we are not supposed to interpret the main effects. So I have following questions:

1. how to interpret for example "treatmentLD_Drug.timepost" here?

2. as my interest is to extract pairwise post treatment effect while controlling for baseline level (pre), then how do I set up the contrast to extract such information based on this LRT output?

Thank you very much as always, I am looking forward to your response,

Wendy

ADD REPLY • link 6.0 years ago wwu222 • 0

0

Entering edit mode

That coefficient has the same interpretation as before, "treatmentLD_Drug.timepost" is the post vs pre effect for the LD treated patients.

If you want to do pairwise comparisons, we only have Wald tests for this. The LRT is a different type of test. See the LRT section in the vignette for more details.

ADD REPLY • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Hi Dr. Love,

Thank you very much! I guess LRT couldn't answer my question. Sorry to bother you again. But considering a simpler pre-post treatment design, and I want to implement a ANCOVA model like "post ~ treatment + pre" into DESeq2 work frame, what is the best way to do it ? Because I know DESeq2 doesn't allow different covariate values for each gene which would be "pre" in this case.

treatment patient.n time
1 Placebo 1 pre
2 Placebo 1 post
3 Placebo 2 pre
4 Placebo 2 post
5 Placebo 3 pre
6 Placebo 3 post
7 LD_Drug 1 pre
8 LD_Drug 1 post
9 LD_Drug 2 pre
10 LD_Drug 2 post
11 LD_Drug 3 pre
12 LD_Drug 3 post

Thank you very much!

ADD REPLY • link 6.0 years ago wwu222 • 0

0

Entering edit mode

I don't follow your question, and I think we've gotten past the point where there is a confusion about how to use the software, and now I think we are more discussing whether to try various statistical approaches. Unfortunately I have limited time to answer questions on the support site, and have to prioritize software related questions, so I'd recommend you try to work with a statistician who can guide you to what kind of statistical models to use. Note that DESeq2 uses the same coefficients as base R, as we leverage base R's model.matrix and design formula. So any statistician familiar with R and linear models may be useful to consult with.

ADD REPLY • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Thanks, Dr. Love. I may not be clear, I will consult a statistician with alternative statistical approach.

Regarding the "pairwise comparisons with Wald tests " you mentioned earlier, do I have to perform a separate model or I can extract results using contrast from full model? If latter, can you please specify? Thanks!

> resultsNames(dds)
"Intercept" "treatmentLD_Drug" "treatmentHD_Drug" "treatmentPlacebo.patient.n2" "treatmentLD_Drug.patient.n2" "treatmentHD_Drug.patient.n2" "treatmentPlacebo.patient.n3" "treatmentLD_Drug.patient.n3" "treatmentHD_Drug.patient.n3" "treatmentHD_Drug.patient.n4" "treatmentPlacebo.timepost" "treatmentLD_Drug.timepost" "treatmentHD_Drug.timepost"

ADD REPLY • link 6.0 years ago wwu222 • 0

0

Entering edit mode

How to do this: see (3) in your original post.

ADD REPLY • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Thanks, Dr. Love. I think 3) is pairwise compare the time effect, but what I want is pairwise compare post treatment effect (with / without adjust baseline level).

Sorry, I understand you are very busy and I may be confused at some point. I really appreciate your time and help!

"3) Yes, you can pairwise compare the time effect like that. Unlike in (1), you do not have a choice, here you have to put all the samples together to perform these comparisons. Another reason to use this approach."

ADD REPLY • link 6.0 years ago wwu222 • 0

0

Entering edit mode

You can not compare the post treatment level alone with this design, where you control for patient baselines, only the post vs pre effect.

Understanding why is a bit complex and you need to have more background perhaps in linear modeling. I strongly recommend consulting with a statistician if these coefficients and designs are confusing or not clear.