I have encountered the same problem posted a while back here: https://www.biostars.org/p/193916/
I am using DESeq2 to run a multifactor paired analysis to determine the change in gene expression between men and women due to a treatment on 70 males and 47 females. Because I have a differing number of samples for each condition, I'm using the following steps to create a custom design matrix: C: DESeq2 paired and multi factor comparison
Everything is fine until estimateDispersionsGeneEst(dds, modelMatrix=mm1) gets called. Previously, when I was using the dds constructor for HTSeq count data, even though I was using feeatureCount output, the command would run for a while then exit citing glibc.
Now I am using the dds constructor for a count matrix, and the estimateDispersionsGeneEst(dds, modelMatrix=mm1) call is taking forever to run. I'm confused as to what I have done wrong, and would appreciate any help.
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] IHW_1.2.0 DESeq2_1.14.1
[3] SummarizedExperiment_1.4.0 Biobase_2.34.0
[5] GenomicRanges_1.26.2 GenomeInfoDb_1.10.2
[7] IRanges_2.8.1 S4Vectors_0.12.1
[9] BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] genefilter_1.56.0 locfit_1.5-9.1 slam_0.1-40
[4] splines_3.3.1 lattice_0.20-33 colorspace_1.3-2
[7] htmltools_0.3.5 base64enc_0.1-3 survival_2.40-1
[10] XML_3.98-1.5 foreign_0.8-66 DBI_0.5-1
[13] BiocParallel_1.8.1 RColorBrewer_1.1-2 plyr_1.8.4
[16] stringr_1.1.0 zlibbioc_1.20.0 munsell_0.4.3
[19] gtable_0.2.0 htmlwidgets_0.8 memoise_1.0.0
[22] latticeExtra_0.6-28 knitr_1.15.1 geneplotter_1.52.0
[25] fdrtool_1.2.15 AnnotationDbi_1.36.2 htmlTable_1.9
[28] Rcpp_0.12.9 acepack_1.4.1 xtable_1.8-2
[31] backports_1.0.5 scales_0.4.1 checkmate_1.8.2
[34] Hmisc_4.0-2 annotate_1.52.1 XVector_0.14.0
[37] lpsymphony_1.2.0 gridExtra_2.2.1 ggplot2_2.2.1
[40] digest_0.6.12 stringi_1.1.2 grid_3.3.1
[43] tools_3.3.1 bitops_1.0-6 magrittr_1.5
[46] lazyeval_0.2.0 RCurl_1.95-4.8 tibble_1.2
[49] RSQLite_1.1-2 Formula_1.2-1 cluster_2.0.4
[52] Matrix_1.2-6 data.table_1.10.4 assertthat_0.1
[55] rpart_4.1-10 nnet_7.3-12
Thank you for the advice. I really like DESeq2's count normalization method, and find it very easy to use, so I'll proceed by applying a strict pre-filter and see what happens from there. Thanks again!
You could require a normalized count of 10 or more in some number x of samples.
E.g:
But I would check first to see if this is running in a reasonable amount of time for small subset of genes.
After several hours of running (without subsetting or prefiltering) it exited with the output: "glibc detected". The first several lines of the output is:
*** glibc detected *** /opt/R-3.3.1/lib64/R/bin/exec/R: double free or corruption (!prev): 0x0000000037643ca0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75f3e)[0x2b46f4815f3e]
/lib64/libc.so.6(+0x78dd0)[0x2b46f4818dd0]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(_ZN4arma3MatIdED2Ev+0x29)[0x2b470a8121c9]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(_ZN4arma6auxlib10det_lapackIdEET_RKNS_3MatIS2_EEb+0x14a)[0x2b470a8145da]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(_ZN4arma6auxlib3detIdNS_3MatIdEEEET_RKNS_4BaseIS4_T0_EE+0x2da)[0x2b470a8149ba]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(_Z13log_posteriordN4Rcpp9MatrixRowILi14EEES1_N4arma3MatIdEEddb+0x1a4)[0x2b470a80a214]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(_Z7fitDispP7SEXPRECS0_S0_S0_S0_S0_S0_S0_S0_S0_S0_+0x732)[0x2b470a80c602]
/home/gls14/R/library/DESeq2/libs/DESeq2.so(DESeq2_fitDisp+0xb3)[0x2b470a821f83]
/opt/R-3.3.1/lib64/R/lib/libR.so(+0xdcc1c)[0x2b46f3dbec1c]
I'm at a loss of how to explain this. Have you ever encountered this before?
Sounds good, I'll send it to you. Thank you for being so generous with your time
I don't have any problem running this on my end on a small subset.
Just to say again (for others reading this thread) your design matrix is really large (234 rows x 119 coefficients), so it makes sense this takes a while to fit with a GLM that has to iteratively reach the solution for each gene.
I tried running estimateDispersionGeneEst for the first 200 genes, and this takes 2.5 minutes. This is the longest step. The MAP step takes 1 minute, and Wald testing takes 1 minute. Extrapolating 4.5 minutes for 200 genes to 15,000 genes, and using 10 cores, this should take about 30 minutes (plus multicore overhead).
However, like I've said, for such a large design matrix I would instead use a linear model, such as limma-voom.
Great, thank you so much for your help. I'll be sure to give limma-voom a try. Thanks again
Even copying your code above and taking a subset of genes, R terminates and outputs the following:
*** glibc detected *** /opt/R-3.3.1/lib64/R/bin/exec/R: double free or corruption (!prev): 0x00000000140ddcb0 ***
======= Backtrace: =========
Can I assume at this point that the problem is with my machine and not how I'm implementing the software?
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.14.1 SummarizedExperiment_1.4.0
[3] Biobase_2.34.0 GenomicRanges_1.26.2
[5] GenomeInfoDb_1.10.2 IRanges_2.8.1
[7] S4Vectors_0.12.1 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 RColorBrewer_1.1-2 plyr_1.8.4
[4] XVector_0.14.0 bitops_1.0-6 base64enc_0.1-3
[7] tools_3.3.1 zlibbioc_1.20.0 digest_0.6.12
[10] rpart_4.1-10 memoise_1.0.0 RSQLite_1.1-2
[13] annotate_1.52.1 tibble_1.2 gtable_0.2.0
[16] htmlTable_1.9 checkmate_1.8.2 lattice_0.20-33
[19] Matrix_1.2-6 DBI_0.5-1 gridExtra_2.2.1
[22] genefilter_1.56.0 stringr_1.1.0 knitr_1.15.1
[25] cluster_2.0.4 htmlwidgets_0.8 locfit_1.5-9.1
[28] grid_3.3.1 nnet_7.3-12 data.table_1.10.4
[31] AnnotationDbi_1.36.2 XML_3.98-1.5 survival_2.40-1
[34] BiocParallel_1.8.1 foreign_0.8-66 latticeExtra_0.6-28
[37] Formula_1.2-1 geneplotter_1.52.0 magrittr_1.5
[40] ggplot2_2.2.1 htmltools_0.3.5 Hmisc_4.0-2
[43] scales_0.4.1 backports_1.0.5 splines_3.3.1
[46] assertthat_0.1 xtable_1.8-2 colorspace_1.3-2
[49] stringi_1.1.2 acepack_1.4.1 RCurl_1.95-4.8
[52] lazyeval_0.2.0 munsell_0.4.3