I am about to get a large microarray data set which I would like to analyze using oligo
. To run the analyses as efficiently as possible, I am exploring the parallelization options mentioned in the oligo user guide (section 7.3; Parallel Computing on Multicore Machines, pages 46-47). See also code below.
However, I don't see any improvement when setting the variable R_THREADS
to a higher number (by doing so running times for this example were reduced with 50% in the user guide) , so I wonder whether that approach is (still) applicable to R on a Windows machine. Any advice would be appreciated.
NB: for me this feature is in the category "nice to have" rather than "need to have". :)
sample code (from user guide); I obtained same results (i.e. same elapsed times) when using my own (larger) data set.
> library(oligo) > library(pd.huex.1.0.st.v2) > library(oligoData) > data(affyExonFS) > t0 <- system.time(res0 <- rma(affyExonFS)) Background correcting Normalizing Calculating Expression > > Sys.setenv(R_THREADS=4) > t1 <- system.time(res1 <- rma(affyExonFS)) Background correcting Normalizing Calculating Expression > > all.equal(res0, res1) [1] TRUE > t0 user system elapsed 19.65 0.56 20.21 > > t1 user system elapsed 19.78 0.53 20.62 > > > sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] oligoData_1.8.0 pd.huex.1.0.st.v2_3.14.1 DBI_0.7 [4] RSQLite_2.0 oligo_1.40.2 Biostrings_2.44.2 [7] XVector_0.16.0 IRanges_2.10.2 S4Vectors_0.14.3 [10] Biobase_2.36.2 oligoClasses_1.38.0 BiocGenerics_0.22.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 compiler_3.4.1 [3] BiocInstaller_1.26.0 GenomeInfoDb_1.12.2 [5] bitops_1.0-6 iterators_1.0.8 [7] tools_3.4.1 zlibbioc_1.22.0 [9] digest_0.6.12 bit_1.1-12 [11] memoise_1.1.0 tibble_1.3.4 [13] preprocessCore_1.38.1 lattice_0.20-35 [15] ff_2.2-13 pkgconfig_2.0.1 [17] rlang_0.1.2 Matrix_1.2-11 [19] foreach_1.4.3 DelayedArray_0.2.7 [21] GenomeInfoDbData_0.99.0 affxparser_1.48.0 [23] bit64_0.9-7 grid_3.4.1 [25] blob_1.1.0 splines_3.4.1 [27] codetools_0.2-15 matrixStats_0.52.2 [29] GenomicRanges_1.28.4 SummarizedExperiment_1.6.3 [31] RCurl_1.95-4.8 affyio_1.46.0 >
Thanks for your feedback. My take-home message: the above (still) is the proper way of using multiple cores with
oligo
, but the impact is indeed not impressive.