Question

scran package: quickCluster and calculateSumFactors parallelization

0

Entering edit mode

qwang178 • 0

@0c1d8e5c

Last seen 13 months ago

United States

Hi I am having difficulty achieving parallelization performance for quickCluster in scran. Since I am working on a very large dataset which cannot be processed without parallelization, I am looking for answers here.

Test code for BiocParallel works fine:

library('BiocParallel')

MulticoreParam()
# class: MulticoreParam
#  bpisup: FALSE; bpnworkers: 38; bptasks: 0; bpjobname: BPJOB
#  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
#  bpexportglobals: TRUE
#  bplogdir: NA
#  bpresultdir: NA
#  cluster type: FORK

birthday <- function(n) {
    m <- 10000
    x <- numeric(m)
    for(i in seq_len(m)) {
        b <- sample(seq_len(365), n, replace = TRUE)
        x[i] <- ifelse(length(unique(b)) == n, 0, 1)
    }
    mean(x)
}

system.time( lapply(seq_len(100), birthday) )
#   user  system elapsed 
# 50.635   0.119  50.757 
system.time( y <- bplapply(seq_len(100), birthday) )
#  user  system elapsed 
# 51.838  13.660   8.979

However, this doesn't work as desired for quickCluster in the scran library. e.g. for Hu cortex data:

library(scRNAseq)
sce <- HuCortexData()

library(scran)
system.time(clusters <- quickCluster(sce))
#   user  system elapsed 
# 279.176   4.578 283.766 
system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam()))
#  user  system elapsed 
# 274.544   4.652 279.208

Nevertheless, this seems to be working for the computeSumFactors function:

system.time(sce <- computeSumFactors(sce, clusters=clusters))
#    user   system  elapsed 
# 1168.887   12.171 1181.147 
system.time(sce <- computeSumFactors(sce, clusters=clusters, BPPARAM = MulticoreParam()))
#    user   system  elapsed 
# 1331.120   49.043  136.132

I am wondering what might be the reason here and if there is any way to make it work for quickCluster. Thanks so much!

sessionInfo( )

``` R version 4.0.0 (2020-04-24) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /packages/7x/R/4.0.0-libpng1.6.37/lib64/R/lib/libRblas.so LAPACK: /packages/7x/R/4.0.0-libpng1.6.37/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] scran_1.18.7 scRNAseq_2.4.0
[3] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 [5] Biobase_2.50.0 GenomicRanges_1.42.0
[7] GenomeInfoDb_1.26.7 IRanges_2.24.1
[9] S4Vectors_0.28.1 BiocGenerics_0.36.1
[11] MatrixGenerics_1.2.1 matrixStats_0.63.0
[13] BiocParallel_1.24.1

loaded via a namespace (and not attached): [1] ProtGenerics_1.22.0 bitops_1.0-7
[3] bit64_4.0.5 progress_1.2.2
[5] httr_1.4.6 tools_4.0.0
[7] utf8_1.2.3 R6_2.5.1
[9] irlba_2.3.5.1 DBI_1.1.3
[11] lazyeval_0.2.2 withr_2.5.0
[13] tidyselect_1.2.0 prettyunits_1.1.1
[15] bit_4.0.5 curl_5.0.0
[17] compiler_4.0.0 cli_3.6.1
[19] BiocNeighbors_1.8.2 xml2_1.3.4
[21] DelayedArray_0.16.3 rtracklayer_1.50.0
[23] askpass_1.1 rappdirs_0.3.3
[25] stringr_1.5.0 digest_0.6.31
[27] Rsamtools_2.6.0 XVector_0.30.0
[29] pkgconfig_2.0.3 htmltools_0.5.5
[31] sparseMatrixStats_1.2.1 limma_3.46.0
[33] dbplyr_2.3.2 fastmap_1.1.1
[35] ensembldb_2.14.1 rlang_1.1.1
[37] RSQLite_2.3.1 DelayedMatrixStats_1.12.3
[39] shiny_1.7.4 generics_0.1.3
[41] dplyr_1.1.2 RCurl_1.98-1.12
[43] magrittr_2.0.3 BiocSingular_1.6.0
[45] scuttle_1.0.4 GenomeInfoDbData_1.2.4
[47] Matrix_1.6-4 Rcpp_1.0.10
[49] fansi_1.0.4 lifecycle_1.0.3
[51] edgeR_3.32.1 stringi_1.7.12
[53] yaml_2.3.7 zlibbioc_1.36.0
[55] BiocFileCache_1.14.0 AnnotationHub_2.22.1
[57] grid_4.0.0 blob_1.2.4
[59] dqrng_0.3.0 promises_1.2.0.1
[61] ExperimentHub_1.16.1 crayon_1.5.2
[63] lattice_0.20-45 Biostrings_2.58.0
[65] beachmat_2.6.4 GenomicFeatures_1.42.3
[67] hms_1.1.3 locfit_1.5-9.4
[69] pillar_1.9.0 igraph_1.4.2
[71] biomaRt_2.46.3 XML_3.99-0.14
[73] glue_1.6.2 BiocVersion_3.12.0
[75] BiocManager_1.30.20 vctrs_0.6.2
[77] httpuv_1.6.11 openssl_2.0.6
[79] purrr_1.0.1 cachem_1.0.8
[81] rsvd_1.0.5 mime_0.12
[83] xtable_1.8-4 AnnotationFilter_1.14.0
[85] later_1.3.1 tibble_3.2.1
[87] GenomicAlignments_1.26.0 AnnotationDbi_1.52.0
[89] memoise_2.0.1 statmod_1.5.0
[91] bluster_1.0.0 ellipsis_0.3.2
[93] interactiveDisplayBase_1.28.0

BiocParallel scran • 1.0k views

ADD COMMENT • link updated 13 months ago by ATpoint ★ 4.7k • written 13 months ago by qwang178 • 0

score 0 · Answer 1 · 2024-01-11

0

Entering edit mode

ATpoint ★ 4.7k

@atpoint-13662

Last seen 1 day ago

Germany

See ?quickCluster which suggests that BiocParallel/BPPARAM is only used if block is set. Beyond that there is simply nothing that allows parallel processing. One thing you can do is to use a faster clustering algorithm. Default is walktrap which is slow. Use louvain. Also use set.seed to make it reproducible.

ADD COMMENT • link 13 months ago ATpoint ★ 4.7k

0

Entering edit mode

OK. Thanks for the reminder. I did a few tests and it seems to work by adding block:

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam()))
#   user  system elapsed 
# 158.757   2.460 161.645 

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 5), block.BPPARAM = MulticoreParam()))
#   user  system elapsed 
# 147.895   9.830  90.032 

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 10), block.BPPARAM = MulticoreParam()))
#  user  system elapsed 
# 93.807   6.548  53.470 

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 50), block.BPPARAM = MulticoreParam()))
#  user  system elapsed 
# 57.606  12.856  38.809 

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 100), block.BPPARAM = MulticoreParam()))
#   user  system elapsed 
# 71.812  10.604  46.667 

system.time(clusters <- quickCluster(sce, BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 50)))
#   user  system elapsed 
# 49.489   0.857  50.518 

system.time(clusters <- quickCluster(sce, block.BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 10)))
#   user  system elapsed 
# 74.230   5.001  41.205 

system.time(clusters <- quickCluster(sce, block.BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 50)))
#   user  system elapsed 
# 52.514   8.597  33.017 

system.time(clusters <- quickCluster(sce, block.BPPARAM = MulticoreParam(), block=cut(seq_len(ncol(sce)), 100)))
#   user  system elapsed 
# 64.279   8.374  40.408

I am wondering if there is any difference between tweaking BPPARAM and block.BPPARAM, and what is the optimal number to set for block. Also how can I set the clustering algorithm? Currently I am only seeing the two method option: "hclust" uses hierarchical clustering while "igraph" uses graph-based clustering.

ADD REPLY • link 13 months ago qwang178 • 0

score 0 · Answer 2 · 2024-01-14

TBH I've forgotten about these settings, but now that I look at them, it seems that BPPARAM just isn't passed along to many of the internal arguments (e.g., modelGeneVar, NNGraphParam, runSVD and so on. I suppose there isn't any reason to _not_ pass those along, though R-based parallelization tends to be rather disappointing due to the overhead of spinning up a separate process. ATpoint is also correct in that the final community detection is performed in serial and usually takes up the bulk of the time - for historical reasons this was set to Walktrap, though Louvain is faster.

The block parallelization only applies if you set block=, where each level of block is processed in parallel.

A hopefully upcoming version of scran (or maybe it might be called something else) will switch to the C++ libraries here to perform all the calculations, at which point we can parallelize more freely without the process overhead.

Anyway, if you want to have the BPPARAM passed along where relevant, submit an issue to the GitHub repo. Or if you're up for it, a PR would be helpful.