nbinomLRT function all cores usage
2
0
Entering edit mode
mcalgaro93 • 0
@mcalgaro93-17855
Last seen 3.2 years ago
Italy

Hi, I have a question about the cores usage during DESeq2 differential expression pipeline. The issue happens when I launch nbinomLRT function. 
The object i give to the function is:
 

> ddsDisp
class: DESeqDataSet 
dim: 843 100 
metadata(1): version
assays(2): counts mu
rownames(843): OTU_2 OTU_3 ... OTU_970 OTU_971
rowData names(9): baseMean baseVar ... dispOutlier dispMAP
colnames(100): Sample_1_grp1 Sample_2_grp1 ... Sample_99_grp2 Sample_100_grp2
colData names(3): grp NF.poscounts sizeFactor

So a matrix with 50 samples from experimental condition grp1 and 50 samples from grp2 (total 100 samples) with 843 rows.
And i call the function:

nbinomLRT(ddsDisp, reduced = ~ 1, full = ~ grp)

As I have to launch a lot of simulations in a server, I need all calculations to stay in a single core. So at the beginning of the script I've used:

register(SerialParam())

But things are different: when the script comes to this function all 20 cores of the server are saturated and the waiting time for a response is more than 7 minutes (for a 843x100 matrix, isn't it strange?)

And i've already tried calling the wrapper DESeq instead of the separated functions:

ddsRes <- DESeq(object = dds, test = "LRT", reduced = ~1, full = ~ grp, parallel = FALSE)
# or even this
ddsRes <- DESeq(object = dds, test = "LRT", reduced = ~1, full = ~ grp, parallel = TRUE, BPPARAM = MulticoreParam(1))

My thought is that, during QR decomposition inside nbinomLRT, the sample size (100) of the dataset is somehow to big and all cores are involved; because with lower sample sizes (10,20,50) the problem doesn't occure. That's why I tried to change the option useQR to FALSE without solving the problem of all cores usage but lowering waiting time. Is there something I can in order to avoid all cores usage?

Here my sessionInfo() (I know there is a newer version of R but in the server I have to use this one :( ):

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRblas.so
LAPACK: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] crayon_1.3.4               bindrcpp_0.2.2             Seurat_2.3.0               cowplot_0.9.3             
 [5] ggplot2_3.0.0              scde_1.99.1                flexmix_2.3-13             lattice_0.20-35           
 [9] MAST_1.4.1                 genefilter_1.60.0          AUC_0.3.0                  BiocParallel_1.12.0       
[13] zinbwave_1.0.0             SingleCellExperiment_1.0.0 samr_2.0                   impute_1.52.0             
[17] ROCR_1.0-7                 gplots_3.0.1               reshape2_1.4.3             plyr_1.8.4                
[21] phyloseq_1.22.3            metagenomeSeq_1.20.1       RColorBrewer_1.1-2         glmnet_2.0-16             
[25] foreach_1.4.4              Matrix_1.2-14              DESeq2_1.20.0              SummarizedExperiment_1.8.1
[29] DelayedArray_0.4.1         matrixStats_0.54.0         Biobase_2.38.0             GenomicRanges_1.30.3      
[33] GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       
[37] edgeR_3.20.9               limma_3.34.9               RevoUtils_10.0.9           RevoUtilsMath_10.0.1      

loaded via a namespace (and not attached):
  [1] SparseM_1.77              prabclus_2.2-6            ModelMetrics_1.2.0        R.methodsS3_1.7.1        
  [5] tidyr_0.8.1               acepack_1.4.1             bit64_0.9-7               knitr_1.20               
  [9] irlba_2.3.2               R.utils_2.7.0             Rook_1.1-1                data.table_1.11.8        
 [13] rpart_4.1-13              RCurl_1.95-4.11           metap_1.0                 snow_0.4-3               
 [17] RSQLite_2.1.1             RANN_2.6                  VGAM_1.0-6                proxy_0.4-22             
 [21] bit_1.1-14                lubridate_1.7.4           assertthat_0.2.0          gower_0.1.2              
 [25] RMTstat_0.3               hms_0.4.2                 DEoptimR_1.0-8            caTools_1.17.1.1         
 [29] readxl_1.1.0              igraph_1.2.2              DBI_1.0.0                 geneplotter_1.56.0       
 [33] htmlwidgets_1.3           ddalpha_1.3.4             RcppArmadillo_0.9.100.5.0 purrr_0.2.5              
 [37] dplyr_0.7.6               backports_1.1.2           permute_0.9-4             trimcluster_0.1-2.1      
 [41] annotate_1.56.2           gbRd_0.4-11               quantreg_5.36             Cairo_1.5-9              
 [45] abind_1.4-5               caret_6.0-80              withr_2.1.2               sfsmisc_1.1-2            
 [49] robustbase_0.93-3         checkmate_1.8.5           vegan_2.5-2               mclust_5.4.1             
 [53] softImpute_1.4            cluster_2.0.7-1           gsl_1.9-10.3              segmented_0.5-3.0        
 [57] ape_5.2                   ADGofTest_0.3             diffusionMap_1.1-0.1      lazyeval_0.2.1           
 [61] recipes_0.1.3             pkgconfig_2.0.2           nlme_3.1-131.1            nnet_7.3-12              
 [65] bindr_0.1.1               rlang_0.2.2               diptest_0.75-7            pls_2.7-0                
 [69] MatrixModels_0.4-1        extRemes_2.0-9            doSNOW_1.0.16             cellranger_1.1.0         
 [73] lmtest_0.9-36             distillery_1.0-4          carData_3.0-2             zoo_1.8-4                
 [77] base64enc_0.1-3           ggridges_0.5.1            png_0.1-7                 rjson_0.2.20             
 [81] stabledist_0.7-1          bitops_1.0-6              R.oo_1.22.0               Lmoments_1.2-3           
 [85] KernSmooth_2.23-15        Biostrings_2.46.0         blob_1.1.1                DRR_0.0.3                
 [89] lars_1.2                  stringr_1.3.1             brew_1.0-6                scales_1.0.0             
 [93] ica_1.0-2                 memoise_1.1.0             magrittr_1.5              bibtex_0.4.2             
 [97] gdata_2.18.0              zlibbioc_1.24.0           compiler_3.4.4            lsei_1.2-0               
[101] pcaMethods_1.70.0         dimRed_0.1.0              fitdistrplus_1.0-11       ade4_1.7-13              
[105] dtw_1.20-1                XVector_0.18.0            pbapply_1.3-4             htmlTable_1.12           
[109] magic_1.5-9               Formula_1.2-3             MASS_7.3-49               mgcv_1.8-23              
[113] tidyselect_0.2.5          stringi_1.2.4             forcats_0.3.0             copula_0.999-18          
[117] yaml_2.2.0                locfit_1.5-9.1            latticeExtra_0.6-28       grid_3.4.4               
[121] tools_3.4.4               rio_0.5.10                rstudioapi_0.8            foreign_0.8-69           
[125] gridExtra_2.3             prodlim_2018.04.18        scatterplot3d_0.3-41      Rtsne_0.13               
[129] digest_0.6.18             FNN_1.1.2.1               lava_1.6.3                fpc_2.1-11.1             
[133] Rcpp_0.12.19              car_3.0-2                 broom_0.5.0               SDMTools_1.1-221         
[137] AnnotationDbi_1.40.0      npsurv_0.4-0              kernlab_0.9-27            Rdpack_0.10-1            
[141] colorspace_1.3-2          ranger_0.10.1             XML_3.98-1.16             CVST_0.2-2               
[145] splines_3.4.4             RcppRoll_0.3.0            multtest_2.34.0           xtable_1.8-3             
[149] jsonlite_1.5              geometry_0.3-6            timeDate_3043.102         modeltools_0.2-22        
[153] ipred_0.9-7               tclust_1.4-1              R6_2.2.2                  Hmisc_4.1-1              
[157] pillar_1.3.0              htmltools_0.3.6           glue_1.3.0                pspline_1.0-18           
[161] class_7.3-14              codetools_0.2-15          tsne_0.1-3                pcaPP_1.9-73             
[165] mvtnorm_1.0-8             tibble_1.4.2              mixtools_1.1.0            numDeriv_2016.8-1        
[169] curl_3.2                  gtools_3.8.1              zip_1.0.0                 openxlsx_4.1.0           
[173] survival_2.41-3           biomformat_1.6.0          munsell_0.5.0             rhdf5_2.22.0             
[177] GenomeInfoDbData_1.0.0    iterators_1.0.10          haven_1.1.2               gtable_0.2.0         

I thank you in advance for your help,
Matteo

deseq2 nbinomLRT sample size biocparallel • 1.6k views
ADD COMMENT
3
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Ciao Matteo,

as you can see from your sessionInfo(), you are using a non-default implementation of the BLAS algebra library (specifically the Microsoft/Revolution implementation). This implementation uses parallel computation and, by default, uses all available cores.

Obviously, DESeq2 is doing some matrix algebra and your system is hence using all the available cores. BiocParallel has nothing to do with it.

At this link, you can find the instructions on how to change the default number of cores: https://mran.microsoft.com/documents/rro/multithread

Luckily, it can all be done within R so just adding the following lines to your script should solve your problem.

library(RevoUtilsMath)
setMKLthreads(1)
ADD COMMENT
0
Entering edit mode

Thank you very much Davide, you solved my problem. :)

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

Hi Matteo,

I’m a bit confused because there are no parallel calls aside from DESeq(..., parallel=TRUE).

The sub functions are not making use of multiple workers or making calls to BiocParallel.

ADD COMMENT
0
Entering edit mode

I'm confused too, because that's the reason why I asked the question. I add a link to a gif, just to show you :)

https://drive.google.com/file/d/1VCTiow9GUg-Fiu2xIQ-5UDCJYNMfH5Ek/view?usp=sharing

Now i'll try to do a check in another PC for checking if the issue is in the server.

ADD REPLY
0
Entering edit mode

Ok. I guess I can’t offer much more advice than to say that unless you use parallel=TRUE when running DESeq(), we are making no hidden use of BiocParallel.

 

ADD REPLY
0
Entering edit mode

You're using an old version of the software and have a ton of other packages attached, so basic debugging steps, though painful, will be to update to the current version of R / DESeq2 and to perform the analysis in a new session with only the essential packages.

ADD REPLY

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6