I have just upgraded my R to 3.2, bioconductor to 3.1 and then installed the latest DESeq2 package and now a DESeq analysis that used to work keeps failing with the same error. Here is what I have been doing:
> register(MulticoreParam(12)) > countData<-read.csv("counts.csv",header=T,row.names=1) > colData<-read.csv("sample_meta_data.csv", header=T, row.names=1) > dds<-DESeqDataSetFromMatrix(countData = countData, colData=colData, design = ~ line + treatment ) # remove some genes with low counts > dds <- estimateSizeFactors(dds) > nc<-counts(dds, normalized=TRUE) > filter<-rowSums(nc >= 15) >= 3 > ddsf<-dds[filter,] # run DESeq2 > ddsf<-DESeq(ddsf, parallel=T) using pre-existing size factors estimating dispersions gene-wise dispersion estimates: 12 workers Error in do.call(rbind, bplapply(levels(idx), function(l) { : error in evaluating the argument 'args' in selecting a method for function 'do.call': > Terminated
It takes quite a while for the error to appear (10 minutes or so?) after the message "gene-wise dispersion estimates" appears.
I have re-installed all previously installed CRAN packages now and also re-installed the BioConductor packages and nothing is showing any errors there.
Does anybody else experience problems with this latest release of DESeq2?
here is my sessionInfo:
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] org.Mm.eg.db_3.1.2 RSQLite_1.0.0 [3] DBI_0.3.1 AnnotationDbi_1.30.1 [5] Biobase_2.28.0 plyr_1.8.3 [7] BiocParallel_1.2.20 gplots_2.17.0 [9] DESeq2_1.8.1 RcppArmadillo_0.5.400.2.0 [11] Rcpp_0.12.0 GenomicRanges_1.20.6 [13] GenomeInfoDb_1.4.2 IRanges_2.2.7 [15] S4Vectors_0.6.5 BiocGenerics_0.14.0 [17] RColorBrewer_1.1-2 BiocInstaller_1.18.4 loaded via a namespace (and not attached): [1] futile.logger_1.4.1 XVector_0.8.0 bitops_1.0-6 [4] futile.options_1.0.0 tools_3.2.2 rpart_4.1-10 [7] digest_0.6.8 annotate_1.46.1 gtable_0.1.2 [10] lattice_0.20-33 proto_0.3-10 gridExtra_2.0.0 [13] genefilter_1.50.0 stringr_1.0.0 cluster_2.0.3 [16] caTools_1.17.1 gtools_3.5.0 locfit_1.5-9.1 [19] nnet_7.3-11 grid_3.2.2 XML_3.98-1.3 [22] survival_2.38-3 foreign_0.8-66 gdata_2.17.0 [25] latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.46.0 [28] ggplot2_1.0.1 reshape2_1.4.1 lambda.r_1.1.7 [31] magrittr_1.5 scales_0.3.0 Hmisc_3.16-0 [34] MASS_7.3-44 splines_3.2.2 xtable_1.7-4 [37] colorspace_1.2-6 KernSmooth_2.23-15 stringi_0.5-5 [40] acepack_1.3-3.3 munsell_0.4.2
> source("http://bioconductor.org/biocLite.R") Bioconductor version 3.1 (BiocInstaller 1.18.4), ?biocLite for help
My BioConductor packages seem to be the right versions:
> BiocInstaller::biocValid() [1] TRUE
Any hints welcome! Thanks!
Thanks for your help. I have now sent you the file via email.
I was able to run this using the devel branch (I don't have R-3.2 easily accessible at the moment). The
updateObject()
line is for updating a DESeq2 v1.8 DESeqDataSet object to version >= 1.9.So I'm not sure what might be causing the error. Given so many samples (n=300) with lots of biological replication (20-30 per level of the factors), I often switch to using limma + voom, which is much faster for fitting.
ok, thanks for running that. I will install the devel branch then and give it a go. I have never used limma. Are you suggestion to do the whole DE gene analysis in limma or just the model fitting?
well, there aren't any relevant DESeq2 changes between release and devel. I'm not sure then what happened. It might just be some cores failing in the background for some (local?) reason, and then do.call() cannot rbind() the results together. I'm just guessing though, because i couldn't reproduce the bug with devel branch on my cluster (and I don't have 12 cores on my laptop to try with release branch).
I was suggesting to just switch to limma+voom for the whole DE gene analysis, if you encounter more problems, as it will be much faster. We find that DESeq2, edgeR and limma often have very high overlap in the DEG lists.
ok, maybe I should try without panellising then just to see if that makes a difference. Will take ages though. I also just tried to repeat the analysis with a data set that I have analysed previously with an older version of R/Bioconductor/DESeq2 and it also fails now, so it's not due to the data.
I can now confirm that the issue appears to be with using BiocParallel. Running the same analysis on the same data on a single core does not cause the problem. Probably also explains why the failure occurs sort of randomly at some point and not always at the same time in the analysis.
Looks like this might be a bug in BiocParallel in release. If you (or Mike) could send me the dds (valerie.obenchain@roswellpark.org) I can take a look.
Valerie
Is there anything I can do to test this further on my end? I tried using the DEBUG output of BiocParallel but it doesn't seem to point me to the problem. Did you get the email with the data frame that caused the problem for me? Thanks for looking into this.
Thanks for sending the ddsf2 object. I could not reproduce this error. I tried with both BiocParallel 1.2.20 and the more recent release version, 1.2.21. Both completed successfully.
I did see the error you report when I manually killed the job before it finished. This may point to something disrupting the job, similar to what Mike was saying about cores failing locally. Has anything changed on your system? Load, number of users?
Valerie
Thanks Valerie for looking into this! I don't think anything should have changed on the computing cluster here. I will get our IT support team to have a look too, now that I know what to look out for. Thanks a lot!