Hi,
I was processing a scRNA-seq data which has about 1.3M cells with around 53000 genes. When I tried the function perFeatureQCMetrics() to get some gene metrics for filtering genes, it ran over 24 hours and still not finished even with parallelization. I preferred to use all cells with bioconductor packages and not to apply Sketch-based analysis in Seurat. Is there a way to speed it up? Thanks a lot!
Best, JG
# Here is the code I used .
bp.params <- MulticoreParam(workers = 26)
per.feat <- perFeatureQCMetrics(sce, BPPARAM = bp.params)
sessionInfo( )
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Azimuth_0.5.0 shinyBS_0.61.1
[3] patchwork_1.3.0 lubridate_1.9.3
[5] forcats_1.0.0 stringr_1.5.1
[7] dplyr_1.1.4 purrr_1.0.2
[9] readr_2.1.5 tidyr_1.3.1
[11] tibble_3.2.1 tidyverse_2.0.0
[13] AnnotationHub_3.12.0 BiocFileCache_2.10.2
[15] dbplyr_2.5.0 PCAtools_2.16.0
[17] ggrepel_0.9.6 scran_1.32.0
[19] BiocParallel_1.38.0 DropletUtils_1.24.0
[21] scater_1.32.1 ggplot2_3.5.1
[23] scuttle_1.14.0 SingleCellExperiment_1.26.0
[25] SummarizedExperiment_1.34.0 Biobase_2.64.0
[27] GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
[29] IRanges_2.38.1 S4Vectors_0.42.1
[31] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
[33] matrixStats_1.4.1
loaded via a namespace (and not attached):
[1] fs_1.6.4 spatstat.sparse_3.1-0
[3] ProtGenerics_1.34.0 bitops_1.0-9
[5] DirichletMultinomial_1.44.0 TFBSTools_1.42.0
[7] httr_1.4.7 RColorBrewer_1.1-3
[9] sctransform_0.4.1 tools_4.4.1
[11] utf8_1.2.4 R6_2.5.1
[13] DT_0.33 HDF5Array_1.32.1
[15] uwot_0.2.2 lazyeval_0.2.2
[17] rhdf5filters_1.16.0 withr_3.0.1
[19] sp_2.1-4 prettyunits_1.2.0
[21] gridExtra_2.3 progressr_0.14.0
[23] cli_3.6.3 spatstat.explore_3.3-2
[25] fastDummies_1.7.4 EnsDb.Hsapiens.v86_2.99.0
[27] shinyjs_2.1.0 Seurat_5.1.0
[29] spatstat.data_3.1-2 ggridges_0.5.6
[31] pbapply_1.7-2 Rsamtools_2.18.0
[33] R.utils_2.12.3 parallelly_1.38.0
[35] BSgenome_1.70.2 limma_3.60.6
[37] RSQLite_2.3.7 generics_0.1.3
[39] BiocIO_1.12.0 gtools_3.9.5
[41] spatstat.random_3.3-2 ica_1.0-3
[43] googlesheets4_1.1.1 GO.db_3.18.0
[45] Matrix_1.7-1 ggbeeswarm_0.7.2
[47] fansi_1.0.6 abind_1.4-8
[49] R.methodsS3_1.8.2 lifecycle_1.0.4
[51] yaml_2.3.10 edgeR_4.2.2
[53] rhdf5_2.48.0 SparseArray_1.4.8
[55] Rtsne_0.17 grid_4.4.1
[57] blob_1.2.4 promises_1.3.0
[59] dqrng_0.4.1 shinydashboard_0.7.2
[61] pwalign_1.0.0 crayon_1.5.3
[63] miniUI_0.1.1.1 lattice_0.22-6
[65] beachmat_2.20.0 cowplot_1.1.3
[67] annotate_1.82.0 GenomicFeatures_1.54.4
[69] KEGGREST_1.44.1 pillar_1.9.0
[71] metapod_1.12.0 rjson_0.2.23
[73] future.apply_1.11.2 codetools_0.2-20
[75] fastmatch_1.1-4 leiden_0.4.3.1
[77] glue_1.8.0 spatstat.univar_3.0-0
[79] data.table_1.16.2 remotes_2.5.0
[81] vctrs_0.6.5 png_0.1-8
[83] spam_2.11-0 cellranger_1.1.0
[85] poweRlaw_0.80.0 gtable_0.3.5
[87] cachem_1.1.0 Signac_1.14.0
[89] S4Arrays_1.4.1 mime_0.12
[91] pracma_2.4.4 survival_3.7-0
[93] gargle_1.5.2 RcppRoll_0.3.1
[95] statmod_1.5.0 bluster_1.14.0
[97] fitdistrplus_1.2-1 ROCR_1.0-11
[99] nlme_3.1-165 bit64_4.5.2
[101] progress_1.2.3 filelock_1.0.3
[103] RcppAnnoy_0.0.22 irlba_2.3.5.1
[105] vipor_0.4.7 KernSmooth_2.23-24
[107] seqLogo_1.68.0 SeuratDisk_0.0.0.9021
[109] colorspace_2.1-1 DBI_1.2.3
[111] tidyselect_1.2.1 processx_3.8.4
[113] bit_4.5.0 compiler_4.4.1
[115] curl_5.2.3 BiocNeighbors_1.22.0
[117] hdf5r_1.3.11 xml2_1.3.6
[119] desc_1.4.3 DelayedArray_0.30.1
[121] plotly_4.10.4 rtracklayer_1.62.0
[123] caTools_1.18.3 scales_1.3.0
[125] lmtest_0.9-40 callr_3.7.6
[127] rappdirs_0.3.3 goftest_1.2-3
[129] digest_0.6.37 presto_1.0.0
[131] spatstat.utils_3.1-0 XVector_0.44.0
[133] htmltools_0.5.8.1 pkgconfig_2.0.3
[135] sparseMatrixStats_1.16.0 fastmap_1.2.0
[137] ensembldb_2.26.0 rlang_1.1.4
[139] htmlwidgets_1.6.4 UCSC.utils_1.0.0
[141] shiny_1.9.1 DelayedMatrixStats_1.26.0
[143] farver_2.1.2 zoo_1.8-12
[145] jsonlite_1.8.9 R.oo_1.26.0
[147] BiocSingular_1.20.0 RCurl_1.98-1.16
[149] magrittr_2.0.3 GenomeInfoDbData_1.2.12
[151] dotCall64_1.2 Rhdf5lib_1.26.0
[153] munsell_0.5.1 Rcpp_1.0.13
[155] viridis_0.6.5 reticulate_1.39.0
[157] stringi_1.8.4 zlibbioc_1.50.0
[159] MASS_7.3-61 plyr_1.8.9
[161] pkgbuild_1.4.4 parallel_4.4.1
[163] listenv_0.9.1 CNEr_1.38.0
[165] deldir_2.0-4 Biostrings_2.72.1
[167] splines_4.4.1 tensor_1.5
[169] hms_1.1.3 locfit_1.5-9.10
[171] BSgenome.Hsapiens.UCSC.hg38_1.4.5 ps_1.7.6
[173] igraph_2.1.1 spatstat.geom_3.3-3
[175] RcppHNSW_0.6.0 reshape2_1.4.4
[177] biomaRt_2.58.2 ScaledMatrix_1.12.0
[179] TFMPvalue_0.0.9 BiocVersion_3.19.1
[181] XML_3.99-0.17 SeuratObject_5.0.2
[183] BiocManager_1.30.23 JASPAR2020_0.99.10
[185] tzdb_0.4.0 httpuv_1.6.15
[187] polyclip_1.10-7 RANN_2.6.2
[189] SeuratData_0.2.2.9001 future_1.34.0
[191] scattermore_1.2 rsvd_1.0.5
[193] xtable_1.8-4 restfulr_0.0.15
[195] AnnotationFilter_1.26.0 RSpectra_0.16-2
[197] later_1.3.2 googledrive_2.1.1
[199] viridisLite_0.4.2 memoise_2.0.1
[201] beeswarm_0.4.0 AnnotationDbi_1.66.0
[203] GenomicAlignments_1.38.2 cluster_2.1.6
[205] timechange_0.3.0 globals_0.16.3