Question

DESEQ2 Experimental Design

0

Entering edit mode

fabiola • 0

@fabiola-6820

Last seen 5 months ago

European Union

Hi everyone,

I have a question regarding the design of the experiment to give to DESeq2.

I have the following condition and I’m trying to understand what is the difference in using a design with or without an intercept term.

I’m using an experimental design containing one factor with three levels.

I read in the manual that “It is important to supply levels (otherwise the levels are chosen in alphabetical order) and to put the ‘control’ or ‘untreated’ level as the first element (‘base level’), so

that the log2 fold changes produced by default will be the expected comparison against the base level”

I understand that in the design= model.matrix(~condition) the intercept should be the control group in order to guarantee that the default contrasts have as denominator the controls (treated vs controls)

What I don’t understand clearly is if the “Intercept design” is only needed to use, let’s say, the controls as the real term of comparison, thing that I understand I can control just setting the contrast terms as i’m writing below.

I thought that, once made sure that the control level is the first level, using the “intercept” design or the “no-intercept” design and declaring the contrast that I want to perform, the same in both cases, should give the same output in terms of Differential Expressed Genes. I was clearly wrong.

I think that the difference occurs when I set

dds <- DESeq(dds,betaPrior = )

in the case of design=~condition betaPrior= TRUE

while in the case of design= ~0+condition I MUST set betaPrior to FALSE.

I would like to understand better what is happening here.

Can someone clarify my doubts and suggest me which is the correct design, if one is better than the other?

(or if I’m just barking up the wrong trees, then I’m sorry for all my misinterpretations)

Thank you,

Fabiola

colData

condition

CTRL_1 control

CTRL_2 control

CTRL_3 control

CTRL_4 control

CTRL_5 control

CTRL_7 control

CTRL_8 control

D_1 treatedD

D_2 treatedD

D_4 treatedD

D_5 treatedD

D_6 treatedD

K_1 treatedK

K_2 treatedK

K_3 treatedK

dds<- DESeqDataSetFromMatrix(countData = countData,

colData = colData,

design= ~colData$condition)

in that case, design= model.matrix(~colData$condition), with an Intercept term

> model.matrix(~colData$condition)

(Intercept) colData$conditiontreatedD colData$conditiontreatedK

1 1 0 0

2 1 0 0

3 1 0 0

4 1 0 0

5 1 0 0

6 1 0 0

7 1 0 0

8 1 1 0

9 1 1 0

10 1 1 0

11 1 1 0

12 1 1 0

13 1 0 1

14 1 0 1

15 1 0 1

attr(,"assign")

[1] 0 1 1

attr(,"contrasts")

attr(,"contrasts")$`colData$condition`

[1] "contr.treatment"

dds <- DESeq(dds,betaPrior = T)

D_res<-results(dds, contrast = c("condition","treatedD","control"))

summary(D_res)

out of 13910 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up) : 16, 0.12%

LFC < 0 (down) : 30, 0.22%

outliers [1] : 7, 0.05%

low counts [2] : 0, 0%

(mean count < 4.3)

[1] see 'cooksCutoff' argument of ?results

[2] see 'independentFiltering' argument of ?results

dds<- DESeqDataSetFromMatrix(countData = countData,

colData = colData,

design= ~0+colData$condition)

so the design is

> model.matrix(~0+colData$condition)

colData$conditioncontrol colData$conditiontreatedD colData$conditiontreatedK

1 1 0 0

2 1 0 0

3 1 0 0

4 1 0 0

5 1 0 0

6 1 0 0

7 1 0 0

8 0 1 0

9 0 1 0

10 0 1 0

11 0 1 0

12 0 1 0

13 0 0 1

14 0 0 1

15 0 0 1

attr(,"assign")

[1] 1 1 1

attr(,"contrasts")

attr(,"contrasts")$`colData$condition`

[1] "contr.treatment"

dds <- DESeq(dds,betaPrior = F)

D_res<-results(dds, contrast = c("condition","treatedD","control"))

summary(D_res)

out of 13910 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up) : 17, 0.12%

LFC < 0 (down) : 38, 0.27%

outliers [1] : 4, 0.029%

low counts [2] : 0, 0%

(mean count < 4.3)

[1] see 'cooksCutoff' argument of ?results

[2] see 'independentFiltering' argument of ?results

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

deseq2 differential gene expression • 5.8k views

ADD COMMENT • link 10.1 years ago fabiola • 0

score 3 · Answer 1 · 2015-01-26

hi Fabiola,

The text in the vignette about setting the base level / reference level of factors is just so that if you say results(dds), the software knows what to do. If you specify the 'contrast', then it makes no difference what the base level was.

The difference you are observing in the summarized results is between the use of a beta prior or not (the default or betaPrior=FALSE). While these two settings often produce nearly the same p-values, they are not exactly the same vector of p-values. So you can see the numbers in the summary change slightly.

You can't use ~0+condition and betaPrior=TRUE. Why? We don't want to shrink the group mean estimates to zero. This is what would happen with ~0+condition. Instead we want to shrink the estimated differences between groups to zero, while the intercept stays in the middle of the groups. So we accomplish this by having models with an unshrunken intercept, and shrunken terms for each group difference. You should just use ~condition if you want to do the standard DESeq2 pipeline with LFC shrinkage.

score 0 · Answer 2 · 2015-01-27

0