Hello guys,
I was surfing the blog and I stopped in this post C: Design and contrast question limma (additive or nested or duplicateCorrelation(). The guy had this experimental design:
Filename Sexe Diet Litter sample1 M LFD L37 sample2 M LFD L37 sample3 M LFD L49 sample4 M LFD L49 sample5 M LFD L50 sample6 M LFD L50 sample7 M WD L48 sample8 M WD L48 sample9 M WD L48 sample10 M WD L48 sample11 M WD L40 sample12 M WD L40 sample13 F LFD L49 sample14 F LFD L50 sample15 F LFD L37 sample16 F LFD L37 sample17 F LFD L37 sample18 F LFD L49 sample19 F WD L39 sample20 F WD L39 sample21 F WD L40 sample22 F WD L40 sample23 F WD L48 sample24 F WD L48
Aaron suggested to create a design matrix in this way:
design <- model.matrix(~0+Litter+paste(Sexe,Diet,sep="."))
Then he suggested to drop term number seven. I have doubt why dropping term number 7 the final two terms represent the average log-fold change for male mice over female, in the LFD or WD-fed mice. The design matrix would be:
SDM.LFD | SDM.WD | Filename | Sexe | Diet |
0 | 0 | sample13 | F | LFD |
0 | 0 | sample14 | F | LFD |
0 | 0 | sample15 | F | LFD |
0 | 0 | sample16 | F | LFD |
0 | 0 | sample17 | F | LFD |
0 | 0 | sample18 | F | LFD |
0 | 0 | sample19 | F | WD |
0 | 0 | sample20 | F | WD |
0 | 0 | sample21 | F | WD |
0 | 0 | sample22 | F | WD |
0 | 0 | sample23 | F | WD |
0 | 0 | sample24 | F | WD |
1 | 0 | sample1 | M | LFD |
1 | 0 | sample2 | M | LFD |
1 | 0 | sample3 | M | LFD |
1 | 0 | sample4 | M | LFD |
1 | 0 | sample5 | M | LFD |
1 | 0 | sample6 | M | LFD |
0 | 1 | sample7 | M | WD |
0 | 1 | sample8 | M | WD |
0 | 1 | sample9 | M | WD |
0 | 1 | sample10 | M | WD |
0 | 1 | sample11 | M | WD |
0 | 1 | sample12 | M | WD |
Do the last two terms represent SDM.LFD-SDF.LFD and SDM.WD-SDF.LFD? Am I wrong?
Best,
Keifa
Hi Aaron,
Maybe I am wrong, but this is what you wrote (https://support.bioconductor.org/p/68916/#110986):
I am referring to design2.
Keifa
Yes, I know. (I've edited my response to avoid confusion.) What I wrote then and now is correct. You'll have to be much clearer about what you don't understand.
Hi Aaron,
thank you for your comment. "SDF.LFD" term seems to be the reference level for factor SD and if you do not drop the coefficient 7 (you need to achieve full column rank), the three terms "SDM.LFD", "SDM.WD" and "SDF.WD" (the once dropped) should represent the log-fold change respect the reference level (SDF.LFD), am I right? What is not clear to me is why once dropped, the last two coefficients (SDM.LFD and SDM.WD) represent the male/female log-fold change in each diet?
Keifa
Because the
SD
levels are nested withinLitter
. Once you drop the 7th term, the reference level in the WD litters becomes "SDF.WD". You can convince yourself of this by looking at the design matrix. For example, let's look at sample 24. In the linear model described bydesign2
(after dropping coefficient 7), sample 24 has the terms:You can see that sample 24's expression is equal to the sum of the intercept and
LitterL48
. This means that sample 24's group (i.e., "SDF.WD") is the baseline for all litter 48 samples. By comparison, if we look at sample 10, we get:... which demonstrates that
SDM.WD
represents the difference in (log-)expression between sample 10's group (i.e., "SDM.WD") and sample 24's group, i.e., the log-fold change between male and female mice in the WD group.Hi Aaron,
I get it, thank you very much.
Keifa