Hello,
I'm working on a large SingleCellExperiment
object (25982 genes x 143019 cells) and would like to perform a differential expression analysis across conditions using pseudobulking.
However, when I call scuttle::aggregateAcrossCells()
, R works through it for a long time and then crashes.
I tried sampling the columns to 1/10th of the size (14302 cells) and it works (after some time), so I think the only problem is the size of the SingleCellExperiment object.
What can I do to get this to work? Are there any workarounds I can use? The function doesn't seem to support parallelisation with BiocParallel
as far as I can see.
Thank you! Enrico
library(tidyverse)
library(scater)
library(scran)
sce <- read_rds("../dat/sce.rds")
sce
class: SingleCellExperiment
dim: 25982 143019
metadata(4): cell_or_tissue_type_colors condition_colors replicate_group_colors replicategroup_colors
assays(2): counts logcounts
rownames(25982): Xkr4 Gm1992 ... CAAA01147332.1 AC149090.1
rowData names(5): ensembl_id gene_symbol n_cells is_mito is_ribo
colnames: NULL
colData names(88): sample organism ... label celltype
reducedDimNames(2): PCA UMAP
altExpNames(0):
# this crashes
pb <- aggregateAcrossCells(sce, ids = colData(sce)[, c("mouse", "label")])
# this works
sce <- sce[, sample(ncol(sce), ncol(sce)/10)]
pb <- aggregateAcrossCells(sce, ids = colData(sce)[, c("mouse", "label")])
# session info
sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/prog/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib/libopenblas_haswellp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scran_1.18.3 scater_1.18.3 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[5] Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.1
[9] S4Vectors_0.28.1 BiocGenerics_0.36.0 MatrixGenerics_1.2.0 matrixStats_0.57.0
[13] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
[17] readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.3
[21] tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 colorspace_2.0-0 ggsignif_0.6.0 ellipsis_0.3.1
[5] rio_0.5.16 scuttle_1.0.4 bluster_1.0.0 XVector_0.30.0
[9] BiocNeighbors_1.8.2 fs_1.5.0 rstudioapi_0.13 ggpubr_0.4.0
[13] ggrepel_0.9.0 fansi_0.4.1 lubridate_1.7.9.2 xml2_1.3.2
[17] sparseMatrixStats_1.2.0 jsonlite_1.7.2 broom_0.7.3 dbplyr_2.0.0
[21] compiler_4.0.3 httr_1.4.2 dqrng_0.2.1 backports_1.2.1
[25] assertthat_0.2.1 Matrix_1.3-0 limma_3.46.0 cli_2.2.0
[29] BiocSingular_1.6.0 tools_4.0.3 rsvd_1.0.3 igraph_1.2.6
[33] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 Rcpp_1.0.5
[37] carData_3.0-4 cellranger_1.1.0 vctrs_0.3.6 writexl_1.3.1
[41] DelayedMatrixStats_1.12.1 openxlsx_4.2.3 beachmat_2.6.4 rvest_0.3.6
[45] lifecycle_0.2.0 irlba_2.3.3 statmod_1.4.35 rstatix_0.6.0
[49] edgeR_3.32.0 zlibbioc_1.36.0 scales_1.1.1 hms_0.5.3
[53] curl_4.3 gridExtra_2.3 stringi_1.5.3 zip_2.1.1
[57] BiocParallel_1.24.1 rlang_0.4.10 pkgconfig_2.0.3 bitops_1.0-6
[61] lattice_0.20-41 tidyselect_1.1.0 magrittr_2.0.1 R6_2.5.0
[65] generics_0.1.0 DelayedArray_0.16.0 DBI_1.1.0 pillar_1.4.7
[69] haven_2.3.1 foreign_0.8-81 withr_2.3.0 abind_1.4-5
[73] RCurl_1.98-1.2 modelr_0.1.8 crayon_1.3.4 car_3.0-10
[77] viridis_0.5.1 locfit_1.5-9.4 grid_4.0.3 readxl_1.3.1
[81] data.table_1.13.6 reprex_0.3.0 munsell_0.5.0 beeswarm_0.2.3
[85] viridisLite_0.3.0 vipor_0.4.5
Thanks Aaron. For whatever reason, using BPPARAM did the trick! I suspect this is related to the problem described by Steve, with parallelisation splitting the object into smaller chunks.
There should be no loss of sparsity at any stage inside
aggregateAcrossCells
; if this is happening, it is a bug. If you can make a minimum example with public data (e.g.,ZilionisLungData
), it would be helpful.