Hi,
using DESeq2::DESeq(..., parallel = TRUE)
with a specific dataset, where genes (rows) were sorted by abundance, I have observed the following error:
Error in (function (cond) :
error in evaluating the argument 'args' in selecting a method for function 'do.call': BiocParallel errors element index: 3
first error: error in evaluating the argument 'x' in selecting a method for function 't': no right-hand side in 'b'
I can reproduce the error with a modified example dataset as used in DESeq2's vignette:
# 1) Read example dataset from pasilla
library("pasilla")
pasCts <- system.file("extdata",
"pasilla_gene_counts.tsv",
package = "pasilla", mustWork = TRUE
)
pasAnno <- system.file("extdata",
"pasilla_sample_annotation.csv",
package = "pasilla", mustWork = TRUE
)
cts <- as.matrix(read.csv(pasCts, sep = "\t", row.names = "gene_id"))
# there are already quite some genes which have 0 counts across all samples
table(rowSums(cts) == 0)
# FALSE TRUE
# 12359 2240
coldata <- read.csv(pasAnno, row.names = 1)
coldata <- coldata[, c("condition", "type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)
# 2) Modify the count matrix to trigger the error
# set the last 2/5 of genes to 0 counts for all samples
i <- cut(seq_len(nrow(cts)), 5, labels = FALSE)
cts[i %in% c(4,5), ] <- 0
# 3) Run DESeq with different numbers of cores
library("DESeq2")
dds <- DESeqDataSetFromMatrix(
countData = cts,
colData = coldata,
design = ~condition
)
# using 3 cores or more triggers the error
BiocParallel::register(BiocParallel::MulticoreParam(3))
res <- DESeq(dds, parallel = TRUE)
# using less than 3 cores works just fine
BiocParallel::register(BiocParallel::MulticoreParam(2))
res <- DESeq(dds, parallel = TRUE)
res <- DESeq(dds, parallel = FALSE)
# 4) It looks like blocks where all genes have 0 counts make the difference:
counts_per_gene <- rowSums(cts)
# if genes are divided into 3 blocks, the last block would have 0-count genes only:
i <- cut(seq_along(counts_per_gene), 3, labels = FALSE)
mean_counts_per_partition <- sapply(unique(i), function(j) {
mean(counts_per_gene[i == j])
})
mean_counts_per_partition
# [1] 11378.01 3433.68 0.00
# if divided into 2 blocks, none would have 0-count genes only:
i <- cut(seq_along(counts_per_gene), 2, labels = FALSE)
mean_counts_per_partition <- sapply(unique(i), function(j) {
mean(counts_per_gene[i == j])
})
mean_counts_per_partition
# [1] 9123.2308 751.5357
While this problem can easily be tackled by using parallel = FALSE
or by removing rows with 0 counts beforehand, it might be nicer if such situation would be caught inside DESeq2::DESeq
.
Thanks in advance,
Steffi
Here is my session Info:
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS/LAPACK: /software/R-v4.0.4/lib/libopenblasp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.30.1 SummarizedExperiment_1.20.0 Biobase_2.50.0 MatrixGenerics_1.2.1
[5] matrixStats_0.58.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.4 IRanges_2.24.1
[9] S4Vectors_0.28.1 BiocGenerics_0.36.0 pasilla_1.18.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 locfit_1.5-9.4 lattice_0.20-41 assertthat_0.2.1 utf8_1.2.1
[6] R6_2.5.0 RSQLite_2.2.4 evoBase_0.1.0 httr_1.4.2 ggplot2_3.3.3
[11] pillar_1.5.1 zlibbioc_1.36.0 rlang_0.4.10 annotate_1.68.0 blob_1.2.1
[16] shinyEvotec_1.0 Matrix_1.3-2 splines_4.0.4 BiocParallel_1.24.1 geneplotter_1.68.0
[21] RCurl_1.98-1.3 bit_4.0.4 munsell_0.5.0 DelayedArray_0.16.2 compiler_4.0.4
[26] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.1.0 GenomeInfoDbData_1.2.4 XML_3.99-0.6
[31] fansi_0.4.2 crayon_1.4.1 dplyr_1.0.5 bitops_1.0-6 grid_4.0.4
[36] xtable_1.8-4 gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1
[41] scales_1.1.1 debugme_1.1.0 cachem_1.0.4 XVector_0.30.0 genefilter_1.72.1
[46] ellipsis_0.3.1 vctrs_0.3.6 generics_0.1.0 fst_0.9.4 RColorBrewer_1.1-2
[51] tools_4.0.4 bit64_4.0.5 glue_1.4.2 purrr_0.3.4 fastmap_1.1.0
[56] survival_3.2-10 yaml_2.2.1 AnnotationDbi_1.52.0 colorspace_2.0-0 memoise_2.0.0
Thanks for the quick answer!
I took some time to also update the section in the vignette on parallelization:
https://github.com/mikelove/DESeq2/commit/e414a2fba3db15b77db2f590ad189ae24ff9af42
Great! Thank you.