scater's runPCA/calculatePCA are unable to run parallel using MulticoreParam()
This error also leaves whatever cores have been started running and does not terminate them when the code errors out.
Is there a workaround of fix for this?
Thanks, Dave
# BEGIN REPREX #
> suppressPackageStartupMessages({
+ library(TENxPBMCData)
+ library(scater)
+ library(scuttle)
+ library(BiocParallel)
+ })
> tenx_pbmc3k <- TENxPBMCData(dataset="pbmc3k")
snapshotDate(): 2020-10-27
see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
loading from cache
> logcounts(tenx_pbmc3k) <- scuttle::normalizeCounts(x = tenx_pbmc3k, log = T)
> assayNames(tenx_pbmc3k)
[1] "counts" "logcounts"
> assay(tenx_pbmc3k, "logcounts")
<32738 x 2700> matrix of class DelayedMatrix and type "double":
[,1] [,2] [,3] ... [,2699] [,2700]
ENSG00000243485 0 0 0 . 0 0
ENSG00000237613 0 0 0 . 0 0
ENSG00000186092 0 0 0 . 0 0
ENSG00000238009 0 0 0 . 0 0
ENSG00000239945 0 0 0 . 0 0
... . . . . . .
ENSG00000215635 0 0 0 . 0 0
ENSG00000268590 0 0 0 . 0 0
ENSG00000251180 0 0 0 . 0 0
ENSG00000215616 0 0 0 . 0 0
ENSG00000215611 0 0 0 . 0 0
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k)
> reducedDims(tenx_pbmc3k)
List of length 1
names(1): PCA
> # this no worky
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k, ncomponents = 50,
+ BPPARAM = MulticoreParam(6))
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
> traceback()
16: serialize(data, node$con, xdr = FALSE)
15: sendData.SOCK0node(backend[[node]], value)
14: parallel:::sendData(backend[[node]], value)
13: .send_to(cluster, i, .DONE())
12: .send_to(cluster, i, .DONE())
11: .bpstop_nodes(x)
10: .bpstop_impl(x)
9: bpstop(BPPARAM)
8: bpstop(BPPARAM)
7: .calculate_pca(mat, transposed = !is.null(dimred), ...)
6: .local(x, ...)
5: calculatePCA(y, ...)
4: calculatePCA(y, ...)
3: .local(x, ...)
2: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
1: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
> BiocManager::valid()
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cloud.r-project.org
* sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libflexiblas.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.24.1 scuttle_1.0.4 scater_1.18.6 ggplot2_3.3.3
[5] TENxPBMCData_1.8.0 HDF5Array_1.18.1 rhdf5_2.34.0 DelayedArray_0.16.3
[9] Matrix_1.3-4 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0
[13] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[17] BiocGenerics_0.36.1 MatrixGenerics_1.2.1 matrixStats_0.59.0
loaded via a namespace (and not attached):
[1] viridis_0.6.1 httr_1.4.2 BiocSingular_1.6.0
[4] viridisLite_0.4.0 bit64_4.0.5 AnnotationHub_2.22.1
[7] DelayedMatrixStats_1.12.3 shiny_1.6.0 assertthat_0.2.1
[10] interactiveDisplayBase_1.28.0 BiocManager_1.30.15 BiocFileCache_1.14.0
[13] blob_1.2.1 vipor_0.4.5 GenomeInfoDbData_1.2.4
[16] yaml_2.2.1 BiocVersion_3.12.0 pillar_1.6.1
[19] RSQLite_2.2.7 lattice_0.20-44 beachmat_2.6.4
[22] glue_1.4.2 digest_0.6.27 promises_1.2.0.1
[25] XVector_0.30.0 colorspace_2.0-1 htmltools_0.5.1.1
[28] httpuv_1.6.1 pkgconfig_2.0.3 zlibbioc_1.36.0
[31] purrr_0.3.4 xtable_1.8-4 scales_1.1.1
[34] later_1.2.0 tibble_3.1.2 generics_0.1.0
[37] ellipsis_0.3.2 cachem_1.0.5 withr_2.4.2
[40] magrittr_2.0.1 crayon_1.4.1 mime_0.10
[43] memoise_2.0.0 fansi_0.5.0 beeswarm_0.4.0
[46] tools_4.0.3 lifecycle_1.0.0 Rhdf5lib_1.12.1
[49] munsell_0.5.0 irlba_2.3.3 AnnotationDbi_1.52.0
[52] compiler_4.0.3 rsvd_1.0.5 rlang_0.4.11
[55] grid_4.0.3 RCurl_1.98-1.3 BiocNeighbors_1.8.2
[58] rhdf5filters_1.2.1 rappdirs_0.3.3 bitops_1.0-7
[61] ExperimentHub_1.16.1 gtable_0.3.0 DBI_1.1.1
[64] curl_4.3.1 R6_2.5.0 gridExtra_2.3
[67] dplyr_1.0.6 fastmap_1.1.0 bit_4.0.4
[70] utf8_1.2.1 ggbeeswarm_0.6.0 Rcpp_1.0.6
[73] vctrs_0.3.8 sparseMatrixStats_1.2.1 dbplyr_2.1.1
[76] tidyselect_1.1.1
Bioconductor version '3.12'
* 0 packages out-of-date
* 1 packages too new
create a valid installation with
BiocManager::install("harmony", update = TRUE, ask = FALSE)
more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date
Warning message:
0 packages out-of-date; 1 packages too new
Thanks Aaron,
I used
snowParam
as a workaround for the time being.Regarding file-backed matrices, I was under the impression that
DelayedArray
had it's own optimized matrix multiplication operator viaDelayedMatrixStats
. I will switch over toRandomParam
, thanks for the advice and assistance.-Dave