Entering edit mode
rbutler
•
0
@rbutler-20667
Last seen 4.8 years ago
I have encountered a curious bug that I haven't seen posted yet. When DESeq is rerun multiple times, it seems to multiply the metadata fields by the square of the MulticoreParam
. By the second or third time it significantly grows the file size and slows the run time, fourth time it errors out. Can be fixed by setting metadata(dds) = metadata(dds)[1]
after the repeated DESeq run.
library(DESeq2)
library(BiocParallel)
register(MulticoreParam(100))
# load a dds where DESeq has been run once
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds
class: DESeqDataSet
dim: 20492 57
metadata(1): version
... # normal rest of the dds data stuff
> # do some stuff and rerun DESeq
> design(dds) = ~ cell + day + trt + day:trt
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes longer than the first time
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50
metadata(10000): version version ... version version #######seems a lot!?
... # normal rest of the dds data stuff
# Run again changing nothing
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes much longer than the second time, 10-15 minutes
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50 ######## this line actually hangs for 30 seconds
metadata(100000000): version version ... version version ####### and is now 10k^2
... # normal rest of the dds data stuff
# so what is it?
> head(metadata(dds))
$version
[1] ‘1.22.2’
$version
[1] ‘1.22.2’
$version
[1] ‘1.22.2’
$version
[1] ‘1.22.2’
$version
[1] ‘1.22.2’
$version
[1] ‘1.22.2’
# maybe its the parallel
> register(MulticoreParam(3))
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
...
final dispersion estimates, fitting model and testing: 3 workers
> dds
class: DESeqDataSet
dim: 20492 57
metadata(9): version version ... version version ########aha!
... # normal rest of the dds data stuff
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: RHEL
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GGally_1.4.0 sva_3.30.1
[3] genefilter_1.64.0 mgcv_1.8-24
[5] nlme_3.1-137 gridExtra_2.3
[7] ggrepel_0.8.1 ggplot2_3.2.0
[9] RColorBrewer_1.1-2 DESeq2_1.22.2
[11] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[13] BiocParallel_1.16.6 matrixStats_0.54.0
[15] Biobase_2.42.0 GenomicRanges_1.34.0
[17] GenomeInfoDb_1.18.2 IRanges_2.16.0
[19] S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.5.1 Formula_1.2-3
[4] assertthat_0.2.1 BiocManager_1.30.4 latticeExtra_0.6-28
[7] blob_1.2.0 GenomeInfoDbData_1.2.0 pillar_1.4.2
[10] RSQLite_2.1.1 backports_1.1.4 lattice_0.20-35
[13] limma_3.38.3 glue_1.3.1 digest_0.6.20
[16] XVector_0.22.0 checkmate_1.9.4 colorspace_1.4-1
[19] plyr_1.8.4 htmltools_0.3.6 Matrix_1.2-14
[22] XML_3.98-1.20 pkgconfig_2.0.2 zlibbioc_1.28.0
[25] purrr_0.3.2 xtable_1.8-4 scales_1.0.0
[28] htmlTable_1.13.1 tibble_2.1.3 annotate_1.60.1
[31] withr_2.1.2 nnet_7.3-12 lazyeval_0.2.2
[34] survival_2.42-3 magrittr_1.5 crayon_1.3.4
[37] memoise_1.1.0 foreign_0.8-70 tools_3.5.1
[40] data.table_1.12.2 stringr_1.4.0 locfit_1.5-9.1
[43] munsell_0.5.0 cluster_2.0.7-1 AnnotationDbi_1.44.0
[46] compiler_3.5.1 rlang_0.4.0 grid_3.5.1
[49] RCurl_1.95-4.12 rstudioapi_0.10 htmlwidgets_1.3
[52] bitops_1.0-6 tcltk_3.5.1 base64enc_0.1-3
[55] gtable_0.3.0 reshape_0.8.8 DBI_1.0.0
[58] R6_2.4.0 knitr_1.23 dplyr_0.8.3
[61] zeallot_0.1.0 bit_1.1-14 Hmisc_4.2-0
[64] stringi_1.4.3 Rcpp_1.0.1 geneplotter_1.60.0
[67] vctrs_0.2.0 rpart_4.1-13 acepack_1.4.1
[70] tidyselect_0.2.5 xfun_0.8
Thanks, I think I have a fix in devel to deal with
rbind
. It's v1.25.13.Can you confirm that it looks good on your end?
That was fun. I installed several github bioconductor packages before realizing I should probably have installed the devel version of Bioconductor in one step. Developer I am not.
Good on this end.
Thanks. Oops I should have added dependencies=FALSE
Even without dependencies It still throws the error for minimum version for S4Vectors and BiocGenerics during compilation, then has issues with Iranges having an array sum error and more. It appears there are some big changes for 3.10?
It looks like you are a release behind? I think DESeq2 v1.24 and v1.25 are similar wrt objects.