Question

Calculated contrasts in DESeq2: difference between manual coefficients and DESeq2 authomatic contrast

0

Entering edit mode

idetoma • 0

@646f29c0

Last seen 24 months ago

Spain

Enter the body of text here

Code should be placed in three backticks as shown below

I have the following model on DESeq2 where I am blocking for replicate.

dds <- DESeqDataSetFromMatrix(countData = CPEB4_featureCounts_3utr_matrix,
                              colData = CPEB4_sample_list,
                              design = ~   replicate  + sample_name)
dds <- DESeq(dds)

These are the metadata:

          sample_name replicate
0195_2022       INPUT         4
0196_2022         IgG         4
0197_2022       CPEB4         4
0198_2022       INPUT         5
0199_2022         IgG         5
0200_2022       CPEB4         5
2125_2021       INPUT         1
2126_2021         IgG         1
2127_2021       CPEB4         1
2235_2021       INPUT         2
2237_2021       CPEB4         2
2238_2021       INPUT         3
2239_2021         IgG         3
2240_2021       CPEB4         3

I want to extract the contrast "CPEB4 - IgG"

I can do it by using the results function like this:

CPEB4vsIgG <- results(dds, contrast=c("sample_name","CPEB4","IgG"))

I get the following DEGs:

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 598, 3.5%
LFC < 0 (down)     : 30, 0.17%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

However, I could also manually calculate the coefficient (I usually do this when I have more complex contrasts), like this:

mod_mat <- model.matrix(design(dds), colData(dds))
CPEB4 <- colMeans(mod_mat[dds$sample_name == "CPEB4", ])
IgG <- colMeans(mod_mat[dds$sample_name == "IgG", ])
CPEB4vsIgG_2 <- results(dds,  contrast = (CPEB4 - IgG))

However, with this code I get a slightly different list of DEGs:

summary(CPEB4vsIgG_2)

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 672, 3.9%
LFC < 0 (down)     : 81, 0.47%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

If I check the coefficient for the two groups I am subtracting it looks like everything is fine:

print(CPEB4)
(Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
             1.0              0.2              0.2              0.2              0.2              0.0 
sample_nameINPUT 
             0.0

print(IgG)
(Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG 
            1.00             0.00             0.25             0.25             0.25             1.00 
sample_nameINPUT 
            0.00

Why is there this difference?

If I create a model without taking into account the replicate I have the same results with the two approaches.

DESeq2 DifferentialExpression • 1.2k views

ADD COMMENT • link updated 2.3 years ago by Michael Love 43k • written 2.3 years ago by idetoma • 0

score 2 · Accepted Answer · 2023-01-19

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

Here is a small example that may help see what is going on in a linear model like this:

> group <- factor(rep(letters[1:3], c(2,2,1)))
> group
[1] a a b b c
Levels: a b c
> treat <- factor(c(0,1,0,1,0))
> y <- rnorm(5)
> coef(lm(y ~ group + treat))
(Intercept)      groupb      groupc      treat1
 -0.5531631   0.3296372   0.4496530   0.9134308
> y[5] <- 100
> coef(lm(y ~ group + treat))
(Intercept)      groupb      groupc      treat1
 -0.5531631   0.3296372 100.5531631   0.9134308

Note that the treat coefficient doesn't use sample 5 at all.

ADD COMMENT • link 2.3 years ago Michael Love 43k

0

Entering edit mode

Thank you I got it!

I actually changed the coefficient for the replicates to be the same between CPBE4 and IgG and I got the exact same result as in the DESeq2 function.

> CPEB42=CPEB4
> CPEB42[2:5]=IgG[2:5]
> CPEB42
     (Intercept)       replicate2       replicate3       replicate4       replicate5   sample_nameIgG sample_nameINPUT 
            1.00             0.00             0.25             0.25             0.25             0.00             0.00 
> summary(results(dds, contrast=CPEB42-IgG))

out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 598, 3.5%
LFC < 0 (down)     : 30, 0.17%
outliers [1]       : 0, 0%
low counts [2]     : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Therefore we must be extra careful when using the second approach in case of unbalanced experimental design!

ADD REPLY • link 2.3 years ago idetoma • 0

0

Entering edit mode

One small clarification. Considering this kind of unbalanced design, which approach is the correct one? Calling the contrast with contrast=c("sample_name","CPEB4","IgG") or the second apporach in my example?

ADD REPLY • link 2.3 years ago idetoma • 0

0

Entering edit mode

I use the coefficient in the model that represents the CPEB4 vs IgG effect, e.g. using the named contrast.

ADD REPLY • link 2.3 years ago Michael Love 43k