Hello,
I'm having difficulties to set up the best design for my experiment. I have 9 different groups (different metals) and a time-course (1,3,6 and 9 months). One of those metals will be my control, and I'm trying to see what miRNAs are changing among metals in each time-point and also if there is a difference among the time point. All samples are independent.
I have so far:
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition + Time)
dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)
head(colData)
Names Condition Time
Ta-3-1 Ta3 Day3
Ta-3-2 Ta3 Day3
Ta-3-3 Ta3 Day3
W-3-1 W3 Day3
W-3-2 W3 Day3
W-3-3 W3 Day3
Ni-3-1 Ni3 Day3
Ni-3-2 Ni3 Day3
Ni-3-3 Ni3 Day3
Co-3-1 Co3 Day3
Co-3-2 Co3 Day3
Co-3-3 Co3 Day3
Fe-3-1 Fe3 Day3
Fe-3-2 Fe3 Day3
Fe-3-3 Fe3 Day3
Cu-3-1 Cu3 Day3
.
.
.
Ta-1-1 Ta1 Day1
Ta-1-2 Ta1 Day1
Ta-1-3 Ta1 Day1
Ta-1-4 Ta1 Day1
Ta-1-5 Ta1 Day1
dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)
resultsNames(dds)
[1] "Intercept" "Condition_Al_vs_Ta" "Condition_Co_vs_Ta" "Condition_Cu_vs_Ta" "Condition_DU_vs_Ta"
[6] "Condition_Fe_vs_Ta" "Condition_Ni_vs_Ta" "Condition_Pb_vs_Ta" "Condition_W_vs_Ta" "Time_Day12_vs_Day1"
[11] "Time_Day3_vs_Day1" "Time_Day6_vs_Day1"
I understand that the LRT is doing the test in all the conditions. However, I want to be able to compare the metals (e.g. Al vs Ta). But I want to do that for 1, 3, 6 and 12 months. Since is not on my Names() I don't know how to proceed.
Thanks in advance
Michael, thanks for your prompt answer. After taking a look at the workflows, I saw that my data goes in "Model matrix not full rank" (missing some groups). That way I though in fitting all the groups, including different time, together and do a contrast for all the comparisons I want to make. Is that the appropriate way to do it or should I fit only one specific time point? Sorry if this basic, I just want to make sure I understood the workflow.
Can you say more about which groups of samples are missing? This will affect what kinds of analyses you can perform.
So I have 9 metals (one is the control) and 4-time points (1,3, 6 and 12 months). Some samples I have 3 replicates and some I have 8. In the 12 months, I have on group missing, because the animals died. Is this helpful?
So then you need to form the model matrices using
model.matrix
and remove the columns that have 0’s. See the vignette section on the full rank error. Then provide the matrices tofull
andreduced
arguments of DESeq().Michael, sorry to bother you again, but I'm stuck now in how to add the model.matrix in dds and run the LRT. Following the vignette I did:
countData= read.table("file.txt", header=T,row.names=1) colData= read.table ("file.txt", header=T) m1<-model.matrix(~ Condition + Time + Condition:Time, colData) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)
Now I don't know to get this new model matrix in my DESeq to run LRT. Could you tell me how to do that?
Additionally, to run the LRT, how I would relevel my control since I have different time points? All my controls from different time points would have to have the same name to run?
Sorry to bother you, but I really appreciate your time, the software and the vignette that is super helpful.
You provide the matrices to
full
andreduced
as stated in the vignette in the section giving that code you are using.It looks like:
You can put whatever design matrices make sense in
mm
ormm0
above. You may want to discuss with a statistician about what these LRT are representing.Thank you so much. All your answer have helped me to understand better all the commands. I still have some doubts and I was wondering if you could point out what could that be:
countData= read.table("miRNAcountsno-pre.ivan.txt", header=T,row.names=1) colData= read.table ("FactorsNo-pre.txt", header=T) library(DESeq2) dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition)
model.matrix
m1<-model.matrix(~ Condition + Time + Condition:Time, colData) colnames(m1) [1] "(Intercept)" "ConditionAl12" "ConditionAl3"
[4] "ConditionAl6" "ConditionCo1" "ConditionCo12"
[7] "ConditionCo3" "ConditionCo6" "ConditionCu1"
[10] "ConditionCu12" "ConditionCu3" "ConditionCu6"
[13] "ConditionDU1" "ConditionDU12" "ConditionDU3"
[16] "ConditionDU6" "ConditionFe1" "ConditionFe12"
[19] "ConditionFe3" "ConditionFe6" "ConditionNi1"
[22] "ConditionNi3" "ConditionNi6" "ConditionPb1"
[25] "ConditionPb12" "ConditionPb3" "ConditionPb6"
[28] "ConditionTa1" "ConditionTa12" "ConditionTa3"
[31] "ConditionTa6" "ConditionW1" "ConditionW12"
[34] "ConditionW3" "ConditionW6" "TimeDay12"
[37] "TimeDay3" "TimeDay6" "ConditionAl12:TimeDay12" [40] "ConditionCo12:TimeDay12" "ConditionCu12:TimeDay12" "ConditionDU12:TimeDay12" [43] "ConditionFe12:TimeDay12" "ConditionPb12:TimeDay12" "ConditionTa12:TimeDay12" [46] "ConditionW12:TimeDay12" "ConditionAl3:TimeDay3" "ConditionCo3:TimeDay3"
[49] "ConditionCu3:TimeDay3" "ConditionDU3:TimeDay3" "ConditionFe3:TimeDay3"
[52] "ConditionNi3:TimeDay3" "ConditionPb3:TimeDay3" "ConditionTa3:TimeDay3"
[55] "ConditionW3:TimeDay3" "ConditionAl6:TimeDay6" "ConditionCo6:TimeDay6"
[58] "ConditionCu6:TimeDay6" "ConditionDU6:TimeDay6" "ConditionFe6:TimeDay6"
[61] "ConditionNi6:TimeDay6" "ConditionPb6:TimeDay6" "ConditionTa6:TimeDay6"
[64] "ConditionW6:TimeDay6"
1- Why I'm not seeing Day1?
all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)
m0<-model.matrix(~ Condition:Time, colData) colnames(m0) all.zero <- apply(m0, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m0 <- m0[,-idx] unname(m0)
dds <- DESeq(dds, full=m1, reduced=m0, test="LRT")
2- If I do like that I still have the "not full rank". What I'm missing?
3- After the LRT test, I want to do the contrast for each different metal and also with different time. Should I do all the contrast one by one? Or should I design all the metals for a specific time and do the contrasts that way?
Thanks once again
I'd recommend at this point that you discuss your particular experiment and your analysis plan with a local statistician. They can help explain, e.g. why Day1 is not in the coefficient list, and how to generate a full and reduced model matrix for LRT.
The support site is mostly for software usage questions, and I just don't have any time in my schedule to go beyond software questions.
Appreciate your time and understand that. I just edited my comment with a third question. Can you tell me about that question?
In the workflow we show how to look at individual time points with a Wald test in addition to a LRT of any differences across time. You can try that example.
Thanks Michael. I think following the workflow, as you suggested, helped me to make my design and do my comparisons. But I still have some doubts in what I'm comparing. Could you give me a hand?
I did:
Design= ~Condition + Time + Condition:Time (full model) vs ~Condition + Time (reduced model). This would give me the changes in the genes that are changing over time, right?
My control is a group named Ta and the time would be 1 month.
resultsNames(dds) [1] "Intercept" "ConditionAl" "ConditionCo" "ConditionCu"
[5] "ConditionDU" "ConditionFe" "ConditionNi" "ConditionPb"
[9] "ConditionW" "Time12month" "Time3month" "Time6month"
[13] "ConditionAl.Time12month" "ConditionCo.Time12month" "ConditionCu.Time12month" "ConditionDU.Time12month" [17] "ConditionFe.Time12month" "ConditionPb.Time12month" "ConditionW.Time12month" "ConditionAl.Time3month" [21] "ConditionCo.Time3month" "ConditionCu.Time3month" "ConditionDU.Time3month" "ConditionFe.Time3month" [25] "ConditionNi.Time3month" "ConditionPb.Time3month" "ConditionW.Time3month" "ConditionAl.Time6month" [29] "ConditionCo.Time6month" "ConditionCu.Time6month" "ConditionDU.Time6month" "ConditionFe.Time6month" [33] "ConditionNi.Time6month" "ConditionPb.Time6month" "ConditionW.Time6month"
For example, when I do ContiditonAl, test=wald, what I’m comparing? The group AL vs my control in month 1? Additionally, if I do ConditionAl.Time12month, it would be Al vs Ta at 12 months?
Thanks in advance
Questions about what the coefficients mean in a statistical model are unfortunately beyond what I have free time to provide on the support site. We have a bit of material in the vignette, but beyond that you should really discuss with a statistical or bioinformatic collaborator.