Hi all,
I had a question about DESeq2, or a little sanity check.
In my case I'm only interested in comparing how expression differs between conditions at specific time-points. I believe I can do comparisons with combined factors e.g. Responder_Week1 v NonResponder_Week1 - but I can also do LRT for full time-course comparisons? Is there a particular method that would be better suited? I believe that I can get similar results from LRT by using the names - however I think I've confused myself further - so I could use a hand.
#Code I've been using
#Using response to treatment and time-point as the design for DESeq
dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~response + time + response:time)
#Using the time-point as the reducement, which means we are testing the effect of the response?
dds <- DESeq(dds, test = "LRT", reduced = ~time)
#Get the results,
results(dds) # However, this will be for all, but we can get a specific comparison by using name and Wald test
results(dds, name = "responseResponder.timeW1", test = "Wald")
#But then, is this comparing a Non-Responder to Responder at Week 1 or Responder at Week 1 against the Base time-point?
#OR, by combined factors, where condition = e.g. Responder_Week1
dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~condition)
#And now results is simply
results(dds, c("condition","Responder_Week1", "NonResponder_Week1"))
Please do correct me, direct me, or criticise me - I'm only hoping to learn more and do this correctly :)
I hope this makes sense, if there's any confusion please feel free to ask (I am confused myself, hence my reaching out!)
Thanks for the reply! Really helpful, just to check on the "reduce" section for the LRT, it says:
So really, my "reduced" in this case should also be
reduced = ~ response + time
if I want to see the genes that differ after point 0 with a response-specific effect?And final question, is there anything wrong with simply combining factors for the design condition, e.g. "Responder_Week1 v NonResponder_Week1", if I am already interested in comparing each time-point? Aren't both methods quite similar, or does one have an advantage over the other?
Right -- if you don't include a variable in the reduced model, you are also rejecting null hypothesis whenever that coefficient is non-zero. If you want to specifically test the interaction, you need to have that term as the only one left out in the reduced model.
The model presented in the workflow has the advantage that you can perform a LRT by leaving out the interaction terms. This wouldn't be easily done with the combined variable.
Awesome, I've ran the code as
reduced = ~ time
, because I'm more interested in the differences between the strains at each time-point.For the results, I get that it goes e.g.
results(dds, name = "responseResponder.timeW1", test = "Wald")
if I'm interested in the differences between responders at week 1. Great.But if I'm interested in the differences between time-points for either the responders or the non-responders, how do I grab that information? I reckon
results(dds, contrast = list(c("responseResponder.timeW1","responseResponder.timeW20")) , test = "Wald
for the responders at weeks 1 and 20, but for non-responders what are the terms?e.g. resultsNames(dds):
"Intercept" "res_status_Response_vs_No_Response" "Time_W_1_vs_W_20" "res_statusResponse.TimeW1"
Thanks for the help so far, apologies but I'm just trying to wrap my head around this - the values between these tests and the combined factors are quite different so I'm interested in what seems more significant.
For in depth questions about statistical analysis, I usually ask that users work with someone at their institute who is familiar with linear models in R, it's just that I don't have spare time outside of software related questions.
Another thing you can do to get a handle on what the terms mean and what contrasts imply is to look at this Bioconductor package and paper:
https://f1000research.com/articles/9-512