Hello,
I'm trying to get my head around some DESeq2 results and I would really appreciate some help.
I have two conditions (control and disease) and one continuous variable (age). Each of the conditions would be conformed by 100 individuals with different ages. I want to see which changes occur in the control group as age increases, and the same thing with the disease group (I know this would be optimally with the same individuals in different time points but I'm working with post-mortem tissue so this is the best i can do).
I have set my DESeq2 design as ~condition+age+condition:age
From resultsNames(dds)
I get:
[1] "Intercept" "condition_disease_vs_ctrl" "age" "conditiondisease.age"
I suppose that "conditiondisease.age"
is the name I am interested for the disease group. But is "age"
the one for the control? (since it is the reference level) Or is it age regardless of the condition? (if it is this option, how can i get "conditionctrl.age"
?)
I also have a question on how to interpret the log2FC on this, according to the vignette this would be the change per unit of the continuous variable (age). If age is integers, is this set to have the lowest value (youngest in my setup) as a reference point? Or the highest (oldest)?
And one more, do I have to sort age before giving it to DESeq2? I am guessing no but it doesn't hurt to ask.
Thank you!!
Thank you Michael! I got exactly what I needed from the first part. However, I'm struggling to understand the LFC explanation. What did you mean by the intercept? I didn't map the age to values starting from 0 (maybe I should do this?). I input the ages as they were (30-90). Or maybe I didn't understand what you meant by the intercept. Sorry, it's hard for me to grasp this no reference fold change.
I may suggest you discuss this with a statistician to have a longer answer regarding the question about a reference point for the continuous variable. The practical answer is that there is no reference point, but it would be good for you to discuss with someone to understand why that is the case and how continuous variables work in linear models.
Hi again! Thank you, I talked about it with someone and I think I now understand what you meant. You meant that no matter how you set up the continuous variable (what it is set to be 0), the LFC in a linear model is going to be for each step of the continuous variable. This is crystal clear now :-).
But my question was more about how to set the 0 from the continuous variable, or how DESeq2 decides how to order this variable (that's what I was trying to say with the reference point but it was the wrong term, I'm sorry), if the 0 is set to be the minimum value or something else? Maybe it depends on how I sort it (for example ordering colData in age decreasing order)?
I guess it is the minimum value but I don't want to risk having a wrong interpretation since this matter a lot with factors.
Sorry for the confusion, and thank you for all the help!
There is no reference point for the continuous variable.
Hi Michael, A somewhat related question is why if you add the interaction term by itself, e.g.
~ Diagnosis + Sex + Diagnosis:Sex
I only get term forM vs. F.
and sex term for disease, but if I remove theSex
term as in~ Diagnosis + Diagnosis:Sex
then I get sex term for both control and disease? If you can point me to where I can find my own answer that would be nice. BrianOk nevermind. When I pull
results(dds, name = "DiagnosisPathologic.SexM")
for~ Diagnosis + Sex + Diagnosis:Sex
it's the same asresults(dds, contrast = list("DiagnosisPathologic.SexM", "DiagnosisControl.SexM"))
for~ Diagnosis + Diagnosis:Sex
.I actually tried the reverse, i.e.
~ Sex + Diagnosis + Sex:Diagnosis
and get same results as above withresults(dds, name = "SexM.DiagnosisPathologic")
. Does the order not matter? I also get same for each withresults(dds, name="Diagnosis_Pathologic_vs_Control")
.Also is it possible to control for two variables at same time?
See other thread re: order of variables in design.
The statistical design is really up to you, any design matrix can be used as long as the columns are linearly independent.