TPP2D - multicore computing for bootstrapNullAlternativeModel(...)
Tobias
Last seen 14 months ago

Hello bioc users,

does anyone know a way to apply multicore computing during the null model fitting step of the TPP2D package? I did a test run using only 2 iterations and it takes forever and runs on a single CPU.

> ### null model fitting
> fstat_df <- computeFStatFromParams(model_params_df)
> set.seed(12, kind = "L'Ecuyer-CMRG")
> ## next step is very sloooooooow
> ## short test B = 2
> null_model_B2 <- bootstrapNullAlternativeModel(df = preproc_df, params_df = model_params_df, B = 2)
[1] "Warning: You have specificed B < 20, it is recommended to use at least B = 20 in order to obtain reliable results."
  |===================================================================================================| 100%

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TPP2D_1.10.0 dplyr_1.0.8 

loaded via a namespace (and not attached):
 [1] zip_2.2.0           Rcpp_1.0.8.3        pillar_1.7.0        compiler_4.1.2      bitops_1.0-7       
 [6] iterators_1.0.14    tools_4.1.2         lifecycle_1.0.1     tibble_3.1.6        gtable_0.3.0       
[11] lattice_0.20-45     pkgconfig_2.0.3     rlang_1.0.2         openxlsx_4.2.5      foreach_1.5.2      
[16] rstudioapi_0.13     DBI_1.1.2           cli_3.2.0           parallel_4.1.2      stringr_1.4.0      
[21] generics_0.1.2      vctrs_0.4.0         grid_4.1.2          tidyselect_1.1.2    glue_1.6.2         
[26] R6_2.5.1            fansi_1.0.3         BiocParallel_1.28.3 limma_3.50.1        tidyr_1.2.0        
[31] ggplot2_3.3.5       purrr_0.3.4         magrittr_2.0.3      scales_1.1.1        codetools_0.2-18   
[36] ellipsis_0.3.2      MASS_7.3-56         assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.2         
[41] stringi_1.7.6       RCurl_1.98-1.6      munsell_0.5.0       doParallel_1.0.17   crayon_1.5.1

according to the function doc the BPPARAM parameter is:

BPPARAM BiocParallel parameter for optional parallelization of null distribution generation through bootstrapping, default: BiocParallel::SerialParam()

Executing that on my system gives:

> BiocParallel::SerialParam()
class: SerialParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE; bpforceGC: FALSE
  bplogdir: NA
  bpresultdir: NA

I guess this means somebody "prepared" the function to use the computing backend, but currently only single CPUs are used by default?

I found a way that worked on my MacBook Pro and also on a Linux (Debian 10):

##### multicore version #####
null_model_B20_mc <- bootstrapNullAlternativeModel(df = preproc_df, params_df = model_params_df, B = 20, BPPARAM = MulticoreParam())
Last seen 15 hours ago
United States

You are already halfway to figuring this out for yourself. Here's how I would proceed.

The argument 'BPPARAM' says something about 'optional parallelization', and points to BiocParallel::SerialParam. Do note that this is a qualified function name, where the first part BiocParallel is the package from which the function SerialParam comes. Your next step should be to explore BiocParallel to see what other methods are available, say by reading the vignette, which tells you about the various methods for parallelizing, and in which cases each is applicable. You know what sort of computer you are using, and should then be able to decide which parallelization scheme is applicable.


