Question

Differences in Limma and Dreamlet random effects from confounded dataset

0

Entering edit mode

jcollier • 0

@c7fb2467

Last seen 8 months ago

United States

I am trying to get a better understanding of how limma and dreamlet handle random effects differently in a confounded pseudobulk scRNA-seq dataset where all of condition 1 are in one batch, and all of condition 2 in a separate batch, where batch is a random variable (rather than a fixed variable).

Where do these analyses diverge, leading to different results?

Load example

library(muscat)
library(SingleCellExperiment)
library(dreamlet)

data(example_sce)
subset_sce <- example_sce[, colData(example_sce)$cluster_id == "B cells"]

Add a "batch" variable. This results in a confounded dataset where all ctrl samples are in batch1 and all stim samples are in batch2.

colData(pb)$batch_id <- c("batch1","batch1","batch2","batch2")

> colData(pb)
DataFrame with 4 rows and 2 columns
        group_id    batch_id
        <factor> <character>
ctrl101     ctrl      batch1
ctrl107     ctrl      batch1
stim101     stim      batch2
stim107     stim      batch2

Dreamlet

res.proc <- processAssays(pb, ~group_id + (1|batch_id))
res.dl <- dreamlet(res.proc, ~group_id + (1|batch_id))

> topTable(res.dl$`B cells`, coef = "group_idstim", number = 5)
            logFC  AveExpr         t      P.Value    adj.P.Val          B     z.std
PLSCR1   4.076507 8.629915 12.328888 5.754163e-21 2.301665e-19 35.5107902  9.394406
SAMD9    2.860439 7.770536  9.914056 4.707924e-16 9.415848e-15 24.9772906  8.118806
BAZ1A    1.258757 7.999992  4.556386 1.682875e-05 2.243833e-04  2.0288219  4.303281
PPIB    -1.082835 7.927141 -3.789278 2.739946e-04 2.739946e-03 -0.6243383 -3.638722
HNRNPDL -0.868700 8.676615 -3.220956 1.783518e-03 1.426815e-02 -2.3691292 -3.124097

Note that if I don't set batch_id as a random variable in the formula, then I get comparable results to limma (ISG15, ISG20 are top DE genes).

res.proc <- processAssays(pb, ~group_id)
res.dl <- dreamlet(res.proc, ~group_id)

> topTable(res.dl$`B cells`, coef = "group_idstim", number = 5)
          logFC   AveExpr        t      P.Value    adj.P.Val        B
ISG15  6.176658 10.230636 19.10827 1.263485e-14 1.070171e-11 23.21015
ISG20  3.600829 11.579442 16.01292 3.922087e-13 1.661004e-10 20.26556
LY6E   4.153726  9.124182 11.23303 2.892466e-10 8.166396e-08 13.71934
PLSCR1 4.076507  8.629915 10.86829 5.210069e-10 1.103232e-07 13.13822
IRF7   3.618931  9.249620 10.39864 1.135347e-09 1.713038e-07 12.38799

Limma

Unlike Dreamlet, these results are the same with or without setting block and correlation (as it is zero anyway and throws a warning).

design <- model.matrix(~ group_id, data=colData(pb))
v <- voom(pb@assays@data$`B cells`, design, plot=TRUE)
dupcor <- limma::duplicateCorrelation(object=v, design=design, block=colData(pb)$batch_id)
fit <- lmFit(v, design, block=colData(pb)$batch_id, correlation=dupcor$consensus.correlation)
fit <- eBayes(fit)

> topTable(fit, coef = "group_idstim", number = 5)
          logFC   AveExpr        t      P.Value    adj.P.Val        B
ISG20 3.5043779 11.579442 17.05821 3.832382e-13 4.855628e-10 20.33819
ISG15 6.1094164 10.230636 16.24125 9.359172e-13 5.929035e-10 18.93230
B2M   0.6715954 15.404921 11.00242 8.728791e-10 3.686460e-07 11.55374
IFIT3 9.0428629  7.531172 10.65543 1.494281e-09 4.733134e-07 10.63147
LY6E  4.2151969  9.124182 10.12225 3.500223e-09 7.675392e-07 11.27878

limma dreamlet • 977 views

ADD COMMENT • link 9 months ago jcollier • 0

1

Entering edit mode

If the group and batch are perfectly confounded, then you're pretty much stuck.

But let's consider some other issues. processAssays() and voom() estimate the mean-variance trend in the data, but depend on the formula provided. Changing the formula will change the estimated precision weights and the results of the differential expression analysis.

The major difference between how the dream / dreamlet and voom() handle random effects is that dream / dreamlet estimate the random effects separately for each gene, while voom() estimates a single variance term shared across all genes. The voom() approach works for small samples, but dream / dreamlet can perform better statistically with sufficient sample size. See https://doi.org/10.1093/bioinformatics/btaa687]

ADD REPLY • link 9 months ago gabriel.hoffman ▴ 170

1

Entering edit mode

If the group and batch are perfectly confounded, then you're pretty much stuck.

I do wonder myself what dreamlet() is doing then, since it seems to be doing something in a situation where I would have thought that estimating random effects was not possible.

voom() estimates a single variance term shared across all genes

It's duplicateCorrelation() rather than voom() that estimates random effects, and it estimates a shared correlation rather than a shared variance. Neither of the variance components are assumed equal across genes.

ADD REPLY • link 9 months ago Gordon Smyth 52k

1

Entering edit mode

1) the dreamlet model should also fail in this case because the model is degenerate. Indeed, it fails following a convergence check with the latest version:

> packageVersion("variancePartition")
# 1.35.5
> packageVersion("dreamlet")
# 1.2.1

Can you show your package versions?

2) of course Gordon is right, it is shared _correlation_.

ADD REPLY • link 9 months ago gabriel.hoffman ▴ 170

0

Entering edit mode

Thank you both for the clarification - I really appreciate it.

Here were the package versions that I was using:

> packageVersion("variancePartition")
# 1.32.4
> packageVersion("dreamlet")
# 1.0.3

ADD REPLY • link 9 months ago jcollier • 0