Hi everyone,
I have a question regarding the design of the experiment to give to DESeq2.
I have the following condition and I’m trying to understand what is the difference in using a design with or without an intercept term.
I’m using an experimental design containing one factor with three levels.
I read in the manual that “It is important to supply levels (otherwise the levels are chosen in alphabetical order) and to put the ‘control’ or ‘untreated’ level as the first element (‘base level’), so
that the log2 fold changes produced by default will be the expected comparison against the base level”
I understand that in the design= model.matrix(~condition) the intercept should be the control group in order to guarantee that the default contrasts have as denominator the controls (treated vs controls)
What I don’t understand clearly is if the “Intercept design” is only needed to use, let’s say, the controls as the real term of comparison, thing that I understand I can control just setting the contrast terms as i’m writing below.
I thought that, once made sure that the control level is the first level, using the “intercept” design or the “no-intercept” design and declaring the contrast that I want to perform, the same in both cases, should give the same output in terms of Differential Expressed Genes. I was clearly wrong.
I think that the difference occurs when I set
dds <- DESeq(dds,betaPrior = )
in the case of design=~condition betaPrior= TRUE
while in the case of design= ~0+condition I MUST set betaPrior to FALSE.
I would like to understand better what is happening here.
Can someone clarify my doubts and suggest me which is the correct design, if one is better than the other?
(or if I’m just barking up the wrong trees, then I’m sorry for all my misinterpretations)
Thank you,
Fabiola
colData
condition
CTRL_1 control
CTRL_2 control
CTRL_3 control
CTRL_4 control
CTRL_5 control
CTRL_7 control
CTRL_8 control
D_1 treatedD
D_2 treatedD
D_4 treatedD
D_5 treatedD
D_6 treatedD
K_1 treatedK
K_2 treatedK
K_3 treatedK
dds<- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design= ~colData$condition)
in that case, design= model.matrix(~colData$condition), with an Intercept term
> model.matrix(~colData$condition)
(Intercept) colData$conditiontreatedD colData$conditiontreatedK
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 0
7 1 0 0
8 1 1 0
9 1 1 0
10 1 1 0
11 1 1 0
12 1 1 0
13 1 0 1
14 1 0 1
15 1 0 1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$`colData$condition`
[1] "contr.treatment"
dds <- DESeq(dds,betaPrior = T)
D_res<-results(dds, contrast = c("condition","treatedD","control"))
summary(D_res)
out of 13910 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 16, 0.12%
LFC < 0 (down) : 30, 0.22%
outliers [1] : 7, 0.05%
low counts [2] : 0, 0%
(mean count < 4.3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
dds<- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design= ~0+colData$condition)
so the design is
> model.matrix(~0+colData$condition)
colData$conditioncontrol colData$conditiontreatedD colData$conditiontreatedK
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 0
7 1 0 0
8 0 1 0
9 0 1 0
10 0 1 0
11 0 1 0
12 0 1 0
13 0 0 1
14 0 0 1
15 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`colData$condition`
[1] "contr.treatment"
dds <- DESeq(dds,betaPrior = F)
D_res<-results(dds, contrast = c("condition","treatedD","control"))
summary(D_res)
out of 13910 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 17, 0.12%
LFC < 0 (down) : 38, 0.27%
outliers [1] : 4, 0.029%
low counts [2] : 0, 0%
(mean count < 4.3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)