Calculated contrasts in DESeq2: difference between manual coefficients and DESeq2 authomatic contrast
1
0
Entering edit mode
idetoma • 0
@646f29c0
Last seen 19 months ago
Spain

Enter the body of text here

Code should be placed in three backticks as shown below

I have the following model on DESeq2 where I am blocking for replicate.

dds <- DESeqDataSetFromMatrix(countData = CPEB4_featureCounts_3utr_matrix,
                              colData = CPEB4_sample_list,
                              design = ~   replicate  + sample_name)
dds <- DESeq(dds)

These are the metadata:

          sample_name replicate
0195_2022       INPUT         4
0196_2022         IgG         4
0197_2022       CPEB4         4
0198_2022       INPUT         5
0199_2022         IgG         5
0200_2022       CPEB4         5
2125_2021       INPUT         1
2126_2021         IgG         1
2127_2021       CPEB4         1
2235_2021       INPUT         2
2237_2021       CPEB4         2
2238_2021       INPUT         3
2239_2021         IgG         3
2240_2021       CPEB4         3

I want to extract the contrast "CPEB4 - IgG"

I can do it by using the results function like this:

CPEB4vsIgG <- results(dds, contrast=c("sample_name","CPEB4","IgG"))

I get the following DEGs:

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 598, 3.5%
LFC < 0 (down)     : 30, 0.17%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

However, I could also manually calculate the coefficient (I usually do this when I have more complex contrasts), like this:

mod_mat <- model.matrix(design(dds), colData(dds))
CPEB4 <- colMeans(mod_mat[dds$sample_name == "CPEB4", ])
IgG <- colMeans(mod_mat[dds$sample_name == "IgG", ])
CPEB4vsIgG_2 <- results(dds,  contrast = (CPEB4 - IgG))

However, with this code I get a slightly different list of DEGs:

summary(CPEB4vsIgG_2)

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 672, 3.9%
LFC < 0 (down)     : 81, 0.47%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

If I check the coefficient for the two groups I am subtracting it looks like everything is fine:

print(CPEB4)
(Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
             1.0              0.2              0.2              0.2              0.2              0.0 
sample_nameINPUT 
             0.0
print(IgG)
(Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
            1.00             0.00             0.25             0.25             0.25             1.00 
sample_nameINPUT 
            0.00

Why is there this difference?

If I create a model without taking into account the replicate I have the same results with the two approaches.

DESeq2 DifferentialExpression • 1.1k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

Here is a small example that may help see what is going on in a linear model like this:

> group <- factor(rep(letters[1:3], c(2,2,1)))
> group
[1] a a b b c
Levels: a b c
> treat <- factor(c(0,1,0,1,0))
> y <- rnorm(5)
> coef(lm(y ~ group + treat))
(Intercept)      groupb      groupc      treat1
 -0.5531631   0.3296372   0.4496530   0.9134308
> y[5] <- 100
> coef(lm(y ~ group + treat))
(Intercept)      groupb      groupc      treat1
 -0.5531631   0.3296372 100.5531631   0.9134308

Note that the treat coefficient doesn't use sample 5 at all.

ADD COMMENT
0
Entering edit mode

Thank you I got it!

I actually changed the coefficient for the replicates to be the same between CPBE4 and IgG and I got the exact same result as in the DESeq2 function.

> CPEB42=CPEB4
> CPEB42[2:5]=IgG[2:5]
> CPEB42
     (Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG sample_nameINPUT 
            1.00             0.00             0.25             0.25             0.25             0.00             0.00 
> summary(results(dds, contrast=CPEB42-IgG))

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 598, 3.5%
LFC < 0 (down)     : 30, 0.17%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Therefore we must be extra careful when using the second approach in case of unbalanced experimental design!

ADD REPLY
0
Entering edit mode

One small clarification. Considering this kind of unbalanced design, which approach is the correct one? Calling the contrast with contrast=c("sample_name","CPEB4","IgG") or the second apporach in my example?

ADD REPLY
0
Entering edit mode

I use the coefficient in the model that represents the CPEB4 vs IgG effect, e.g. using the named contrast.

ADD REPLY

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6