Question

WGCNA parallelization multi thread for blockwiseModules and TOMsimilarity

0

Entering edit mode

bioming ▴ 10

@bioming-21835

Last seen 5.0 years ago

Queen's University

Hello,

I'm currently using WGCNA v1.68 to do network analysis on 50k probes. I have a few questions regarding parallelization in WGCNA, particularly running the blockwiseModules and TOMsimilarity functions. I came across another bioconductor question on this topic (https://support.bioconductor.org/p/86147/), but it was from 3 years ago, so I was wondering if there's any updates that I should be aware of?

In the previous questioned, Peter said that blockwiseModules was not parallelized, has this changed? He kindly suggested using an faster BLAS to speed up matrix multiplication in TOM calculations, currently my output when running TOMsimilarity() is showing "..matrix multiplication (system BLAS)..", so I'm guessing the system BLAS is not the fast BLAS Peter's referring to? Does anyone know which fast BLAS I should try installing? (I'm currently using R on a CentOS server with up to 50 cores).

I already tried setting "enableWGCNAThreads(nThreads = 50)", but I don't think it did anything.

Much thanks for any help anyone can provide,

Ming

wgcna parallelization • 3.4k views

ADD COMMENT • link updated 5 months ago by Peter Langfelder ★ 3.0k • written 5.4 years ago by bioming ▴ 10

score 2 · Answer 1 · 2019-12-05

2

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

I'll try to explain this as best as I can. When calculating TOM from expression data, WGCNA package does some parallelization but this is only performed in correlation calculations, and even those lead to noticeable speedup only when there are many missing values in the expression data which is rare these days. When there are no missing data, the step that is most time-consuming is the matrix multiplication of the adjacency with itself. This is performed in WGCNA by a call to a BLAS routine unless the argument useInternalMatrixAlgebra is TRUE (by default it is FALSE) in which case the matrix multiplication is performed by a slow WGCNA-own routine. I do not recommend this route unless you have a good reason to suspect that your BLAS libraries are buggy.

When WGCNA reports "using system BLAS", it means it uses whatever R was compiled against. You may be able to see that when you run sessionInfo(): mine reports a line

BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.7.so

signifying my R is compiled against OpenBLAS which is quite fast.

Depending on what system you are on and whether you have administrative privileges, getting R to work with a fast BLAS may be trivial or very complicated. I recommend starting with R installation manual at https://cran.r-project.org/doc/manuals/r-release/R-admin.html, specifically the BLAS section at https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS.

ADD COMMENT • link 5.4 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Hi Peter, thank-you so much for your fast response, I understand more now. When I do sessionInfo() it indeed shows the generic Rblas. I'm working with a computing cluster and getting R to switch to using openBLAS seems to be complicated as you have foreseen.

On another note regarding blockwiseModules, I just wanted to confirm the parallelization inside this function. I tried testing on some BRCA data (590 subjects x 8640 genes), and ran with:

nThreads = 1 and maxBlockSize = 5000 (2 blocks) (took 6min)
nThreads = 18 and maxBlockSize = 5000 (2 blocks) (also 6min)
nThreads = 1 and maxBlockSize = 500 (18 blocks) (also 6min)
nThreads = 18 and maxBlockSize = 500 (18 blocks) (also 3min47sec)

Am I correct in assuming the 18 blocks, when given enough threads, will execute in parallel? But the TOM calculations inside blockwiseModules is the one that has not been parallelized yet, and will benefit from openBLAS? So if I wanted to run a large dataset, running blockwiseModules together with openBLAS will be the best way to go?.

Again thank-you for your time in answer my questions.

Ming

ADD REPLY • link 5.4 years ago bioming ▴ 10

0

Entering edit mode

Hi Peter! A query regarding the use of BLAS. Can different version impact speed of execution? I have been trying to run my script on two different servers. For the part involving pickSoftThreshold, while it takes about 3 minutes on one server it has not compiled on the other even in 12 hours. I was wondering why this might be. The input matrix contains about 12500 genes and 100 samples. I have 60 threads enabled. The following are some information from sessionInfo.

Server 1 (fast execution)::

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.10 (Green Obsidian)
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.15.so; LAPACK version 3.9.0
Matrix_1.6-1.1
stats4_4.3.2
WGCNA_1.73

Server 2 (slow execution)::

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 9.4 (Blue Onyx)
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP; LAPACK version 3.9.0
Matrix_1.6-5
stats4_4.3.3
WGCNA_1.73

ADD REPLY • link 5 months ago Arindam ▴ 80

0

Entering edit mode

I suspect that the slow execution is actually stuck. I am not familiar with what flexiBLAS OPENBLAS-OPENMP actually uses as BLAS, but it is possible that the BLAS implementation in use is not re-entrant, i.e., if you run the same BLAS routine in two different threads, they will clash and potentially never end the calculation. This used to happen with GotoBLAS. If that's not the case, I would double-check that the other server is actually running the code rather than it waiting in the queue, and if it's running that is has enough physical RAM to run. You can add argument verbose = 3 to the call of pickSoftThreshold and watch the output file to see if the function is making any progress. Lastly, 60 threads is way too many. Let it run single-threaded or at most 8 threads - in my experience, more than that will only slow down the system.

ADD REPLY • link 5 months ago Peter Langfelder ★ 3.0k