Question

Limma DesignMatrix, dropping terms

1

Entering edit mode

Keifa ▴ 10

@keifa-11939

Last seen 6.6 years ago

Hello guys,

I was surfing the blog and I stopped in this post C: Design and contrast question limma (additive or nested or duplicateCorrelation(). The guy had this experimental design:

Filename    Sexe    Diet    Litter
sample1    M    LFD    L37
sample2    M    LFD    L37
sample3    M    LFD    L49
sample4    M    LFD    L49
sample5    M    LFD    L50
sample6    M    LFD    L50
sample7    M    WD    L48
sample8    M    WD    L48
sample9    M    WD    L48
sample10    M    WD    L48
sample11    M    WD    L40
sample12    M    WD    L40
sample13    F    LFD    L49
sample14    F    LFD    L50
sample15    F    LFD    L37
sample16    F    LFD    L37
sample17    F    LFD    L37
sample18    F    LFD    L49
sample19    F    WD    L39
sample20    F    WD    L39
sample21    F    WD    L40
sample22    F    WD    L40
sample23    F    WD    L48
sample24    F    WD    L48

Aaron suggested to create a design matrix in this way:

design <- model.matrix(~0+Litter+paste(Sexe,Diet,sep="."))

Then he suggested to drop term number seven. I have doubt why dropping term number 7 the final two terms represent the average log-fold change for male mice over female, in the LFD or WD-fed mice. The design matrix would be:

SDM.LFD	SDM.WD	Filename	Sexe	Diet
0	0	sample13	F	LFD
0	0	sample14	F	LFD
0	0	sample15	F	LFD
0	0	sample16	F	LFD
0	0	sample17	F	LFD
0	0	sample18	F	LFD
0	0	sample19	F	WD
0	0	sample20	F	WD
0	0	sample21	F	WD
0	0	sample22	F	WD
0	0	sample23	F	WD
0	0	sample24	F	WD
1	0	sample1	M	LFD
1	0	sample2	M	LFD
1	0	sample3	M	LFD
1	0	sample4	M	LFD
1	0	sample5	M	LFD
1	0	sample6	M	LFD
0	1	sample7	M	WD
0	1	sample8	M	WD
0	1	sample9	M	WD
0	1	sample10	M	WD
0	1	sample11	M	WD
0	1	sample12	M	WD

Do the last two terms represent SDM.LFD-SDF.LFD and SDM.WD-SDF.LFD? Am I wrong?

Best,

Keifa

Limma designMatrix • 1.5k views

ADD COMMENT • link updated 6.8 years ago by Aaron Lun ★ 28k • written 6.8 years ago by Keifa ▴ 10

score 0 · Answer 1 · 2018-07-13

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 13 hours ago

The city by the bay

Your code isn't exactly what I wrote:

SD <- as.factor(paste(targets$Sexe, targets$Diet, sep="."))
Litter <- targets$Litter
design2 <- model.matrix(~Litter + SD)

... and your "design matrix" is not correct, either. My last two coefficients are:

   SDM.LFD SDM.WD
1        1      0
2        1      0
3        1      0
4        1      0
5        1      0
6        1      0
7        0      1
8        0      1
9        0      1
10       0      1
11       0      1
12       0      1
13       0      0
14       0      0
15       0      0
16       0      0
17       0      0
18       0      0
19       0      0
20       0      0
21       0      0
22       0      0
23       0      0
24       0      0

Dropping the 7th term in design2 is necessary to achieve full column rank, otherwise there is no unique least-squares solution to the system of linear equations. Once dropped, the last two coefficients represent the male/female log-fold change in each diet. There is no "SDF.LFD" term in the final matrix, so I don't know what you're referring to there.

ADD COMMENT • link 6.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

Maybe I am wrong, but this is what you wrote (https://support.bioconductor.org/p/68916/#110986):

> design2 <- model.matrix(~0 + Litter + SD)
> design2 <- design2[,-7]
> colnames(design2)
[1] "LitterL37" "LitterL39" "LitterL40" "LitterL48" "LitterL49" "LitterL50"
[7] "SDM.LFD"   "SDM.WD"

I am referring to design2.

Keifa

ADD REPLY • link 6.8 years ago Keifa ▴ 10

0

Entering edit mode

Yes, I know. (I've edited my response to avoid confusion.) What I wrote then and now is correct. You'll have to be much clearer about what you don't understand.

ADD REPLY • link 6.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

thank you for your comment. "SDF.LFD" term seems to be the reference level for factor SD and if you do not drop the coefficient 7 (you need to achieve full column rank), the three terms "SDM.LFD", "SDM.WD" and "SDF.WD" (the once dropped) should represent the log-fold change respect the reference level (SDF.LFD), am I right? What is not clear to me is why once dropped, the last two coefficients (SDM.LFD and SDM.WD) represent the male/female log-fold change in each diet?

Keifa

ADD REPLY • link 6.8 years ago Keifa ▴ 10

1

Entering edit mode

Because the SD levels are nested within Litter. Once you drop the 7th term, the reference level in the WD litters becomes "SDF.WD". You can convince yourself of this by looking at the design matrix. For example, let's look at sample 24. In the linear model described by design2 (after dropping coefficient 7), sample 24 has the terms:

(Intercept)   LitterL39   LitterL40   LitterL48   LitterL49   LitterL50
          1           0           0           1           0           0
    SDM.LFD      SDM.WD
          0           0

You can see that sample 24's expression is equal to the sum of the intercept and LitterL48. This means that sample 24's group (i.e., "SDF.WD") is the baseline for all litter 48 samples. By comparison, if we look at sample 10, we get:

(Intercept)   LitterL39   LitterL40   LitterL48   LitterL49   LitterL50
          1           0           0           1           0           0
    SDM.LFD      SDM.WD
          0           1

... which demonstrates that SDM.WD represents the difference in (log-)expression between sample 10's group (i.e., "SDM.WD") and sample 24's group, i.e., the log-fold change between male and female mice in the WD group.

ADD REPLY • link 6.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

I get it, thank you very much.

Keifa

ADD REPLY • link 6.8 years ago Keifa ▴ 10