Dispersion estimation step is very slow with latest DESeq2 and glmGamPoi
2
0
Entering edit mode
I-Hsuan Lin ▴ 10
@i-hsuan-lin-9804
Last seen 9 months ago
United Kingdom

Hi, I recently upgraded to latest R 4.3 from R 4.1. While running the DEA workflow on scRNA-seq data, I noticed it was dramatically slower with my latest setup. I wonder if anyone are aware of changes that may have contributed to this, and how I can make the dispersion estimation step to run as fast as before?

Using a small dataset with 49 cells as demo (22 vs. 27 in two tissues), here's how I run DESeq():

dds <- DESeqDataSetFromMatrix(countData = counts(sce),
                              colData = droplevels(colData(sce)[,c("Barcode","tissue")]),
                              rowData = rowData(sce), design = ~ tissue)
sizeFactors(dds) <- sizeFactors(sce)
dds

system.time({
    dds <- DESeq(dds, test = "LRT", useT = TRUE, minmu = 1e-6, minReplicatesForReplace = Inf,
                 fitType = "glmGamPoi", sfType = "poscounts", reduced = ~1, quiet = FALSE)
})
# dds
class: DESeqDataSet 
dim: 17881 49 
metadata(1): version
assays(1): counts
rownames(17881): Xkr4 Gm37381 ... CAAA01147332.1 AC149090.1
rowData names(5): ID Symbol Type SEQNAME is_mito
colnames(49): LR03_AGGCCGTTCACCTCGT-1 LR03_AGGGATGAGTGCGATG-1 ... LR04_TTGACTTAGGTGTTAA-1 LR04_TTGACTTGTCTGCGGT-1
colData names(3): Barcode tissue sizeFactor

When using R version 4.1.3, DESeq2_1.34.0 and glmGamPoi_1.6.0, the run time is:

   user  system elapsed 
 10.996   6.414   9.632

When using R version 4.3.2, DESeq2_1.42.0 and glmGamPoi_1.14.3, the run time is:

    user   system  elapsed 
1661.721   25.540 1679.590

Edit:

More digging into the codes themselves, there seems to a change in how initial_lp and last_lp is calculated. This greatly increase the amount of time vapply takes to perform the calculation.

In the current version, the whole fitMu matrix is used as mu

      initial_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

      last_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat_new[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

Whereas in the previous version, the per-gene vector fitMu[idx, ] is used as mu, which I think make more sense, isn't it? I think this is how fitMu was used in the fitting process in DESeq2.cpp

      initial_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

      last_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat_new[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

Also, why is fitMu used as mean input when running glmGamPoi::overdispersion_mle(), and not the subset (fitMu[fitidx, ]) as in the previous version?

      # New version
      dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu,
                                                       model_matrix = modelMatrix, verbose = ! quiet)

      # Old version
      dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu[fitidx, ],
                                                       model_matrix = modelMatrix, verbose = ! quiet)
DESeq2 glmGamPoi • 660 views
ADD COMMENT
1
Entering edit mode
I-Hsuan Lin ▴ 10
@i-hsuan-lin-9804
Last seen 9 months ago
United Kingdom

Thanks to Mike, there's a fix on DESeq2's GitHub repo: https://github.com/thelovelab/DESeq2/tree/glmgampoi-iteration-fix

Relevant discussion about this for anyone interested: https://github.com/thelovelab/DESeq2/pull/64

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Thanks for the post here and on GitHub.

I'll take a look this week.

ADD COMMENT
0
Entering edit mode

Thanks Michael Love

ADD REPLY

Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6