DESEQ2 Experimental Design
2
0
Entering edit mode
fabiola • 0
@fabiola-6820
Last seen 26 days ago
European Union

Hi everyone,

I have a question regarding the design of the experiment to give to DESeq2. 

I have the following condition and I’m trying to understand what is the difference in using a design with or without an intercept term. 

I’m using an experimental design containing one factor with three levels.

I read in the manual that “It is important to supply levels (otherwise the levels are chosen in alphabetical order) and to put the ‘control’ or ‘untreated’ level as the first element (‘base level’), so

that the log2 fold changes produced by default will be the expected comparison against the base level”

 

I understand that in the design= model.matrix(~condition) the intercept should be the control group in order to guarantee that the default contrasts have as denominator the controls (treated vs controls)

 

What I don’t understand clearly is if the “Intercept design” is only needed to use, let’s say, the controls as the real term of comparison, thing that I understand I can control just setting the contrast terms as i’m writing below.

 

I thought that, once made sure that the control level is the first level, using the “intercept” design or the “no-intercept” design and declaring the contrast that I want to perform, the same in both cases, should give the same output in terms of Differential Expressed Genes. I was clearly wrong.

I think that the difference occurs when I set 

dds <- DESeq(dds,betaPrior = )

 

in the case of design=~condition betaPrior= TRUE

while in the case of design= ~0+condition I MUST set betaPrior to FALSE.

 

I would like to understand better what is happening here.

 

Can someone clarify my doubts and suggest me which is the correct design, if one is better than the other? 

(or if I’m just barking up the wrong trees, then I’m sorry for all my misinterpretations)  

 

Thank you,

 

Fabiola

 

 

 

colData

              condition

CTRL_1   control

CTRL_2   control

CTRL_3   control

CTRL_4   control

CTRL_5   control

CTRL_7   control

CTRL_8   control

D_1   treatedD

D_2   treatedD

D_4   treatedD

D_5   treatedD

D_6   treatedD

K_1  treatedK

K_2  treatedK

K_3  treatedK

 

 

 

dds<- DESeqDataSetFromMatrix(countData = countData,

                             colData = colData,

                             design= ~colData$condition)

 

in that case, design= model.matrix(~colData$condition), with an Intercept term

 

> model.matrix(~colData$condition)

   (Intercept) colData$conditiontreatedD colData$conditiontreatedK

1            1                         0                         0

2            1                         0                         0

3            1                         0                         0

4            1                         0                         0

5            1                         0                         0

6            1                         0                         0

7            1                         0                         0

8            1                         1                         0

9            1                         1                         0

10           1                         1                         0

11           1                         1                         0

12           1                         1                         0

13           1                         0                         1

14           1                         0                         1

15           1                         0                         1

attr(,"assign")

[1] 0 1 1

attr(,"contrasts")

attr(,"contrasts")$`colData$condition`

[1] "contr.treatment"

 

dds <- DESeq(dds,betaPrior = T)

 

D_res<-results(dds, contrast = c("condition","treatedD","control"))

summary(D_res)

 

out of 13910 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up)     : 16, 0.12% 

LFC < 0 (down)   : 30, 0.22% 

outliers [1]     : 7, 0.05% 

low counts [2]   : 0, 0% 

(mean count < 4.3)

[1] see 'cooksCutoff' argument of ?results

[2] see 'independentFiltering' argument of ?results

 

 

 

dds<- DESeqDataSetFromMatrix(countData = countData,

                             colData = colData,

                             design= ~0+colData$condition)

 

so the design is 

 

> model.matrix(~0+colData$condition)

   colData$conditioncontrol colData$conditiontreatedD colData$conditiontreatedK

1                         1                         0                         0

2                         1                         0                         0

3                         1                         0                         0

4                         1                         0                         0

5                         1                         0                         0

6                         1                         0                         0

7                         1                         0                         0

8                         0                         1                         0

9                         0                         1                         0

10                        0                         1                         0

11                        0                         1                         0

12                        0                         1                         0

13                        0                         0                         1

14                        0                         0                         1

15                        0                         0                         1

attr(,"assign")

[1] 1 1 1

attr(,"contrasts")

attr(,"contrasts")$`colData$condition`

[1] "contr.treatment"

 

 

dds <- DESeq(dds,betaPrior = F)

 

D_res<-results(dds, contrast = c("condition","treatedD","control"))

summary(D_res)

 

out of 13910 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up)     : 17, 0.12% 

LFC < 0 (down)   : 38, 0.27% 

outliers [1]     : 4, 0.029% 

low counts [2]   : 0, 0% 

(mean count < 4.3)

[1] see 'cooksCutoff' argument of ?results

[2] see 'independentFiltering' argument of ?results

 

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

deseq2 differential gene expression • 5.6k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

hi Fabiola,

The text in the vignette about setting the base level / reference level of factors is just so that if you say results(dds), the software knows what to do. If you specify the 'contrast', then it makes no difference what the base level was.

The difference you are observing in the summarized results is between the use of a beta prior or not (the default or betaPrior=FALSE). While these two settings often produce nearly the same p-values, they are not exactly the same vector of p-values. So you can see the numbers in the summary change slightly.

You can't use ~0+condition and betaPrior=TRUE. Why? We don't want to shrink the group mean estimates to zero. This is what would happen with ~0+condition. Instead we want to shrink the estimated differences between groups to zero, while the intercept stays in the middle of the groups. So we accomplish this by having models with an unshrunken intercept, and shrunken terms for each group difference. You should just use ~condition if you want to do the standard DESeq2 pipeline with LFC shrinkage.

ADD COMMENT
0
Entering edit mode
fabiola • 0
@fabiola-6820
Last seen 26 days ago
European Union

Thank you Michael!

ADD COMMENT

Login before adding your answer.

Traffic: 503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6