Question

WGCNA blockwiseModules parallelisation question

1

Entering edit mode

caitlin ▴ 10

@caitlin-11298

Last seen 8.5 years ago

Hello everyone,

I'm using WGCNA on a pretty large RNA-seq dataset from soil - 600,000 genes after filtering for poor spurious hits. I did a trial run with a subset of 4000 genes on my laptop, and it worked fantasticly and am in the process of applying it to the larger dataset. I have a 68 cpu server with 1TB of RAM to run the analysis on, and am currently using the following R input:

bwnet = blockwiseModules(datExpr, maxBlockSize = 46000, power = 18, networkType = "signed", TOMType = "signed", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.2, numericLabels = TRUE, saveTOMs = TRUE, saveTOMFileBase = "permafrostmetaT-blockwise", verbose = 3)

So I have 16 blocks to process, and after 56 hours I am at this point:

" Calculating module eigengenes block-wise from all genes
   Flagging genes and samples with too many missing values...
    ..step 1
....pre-clustering genes to determine blocks..
   Projective K-means:
   ..k-means clustering..
   ..merging smaller clusters...
Block sizes:
gBlocks
    1     2     3     4     5     6     7     8     9    10    11    12    13
45968 45966 45643 45616 45425 44946 42957 41659 40476 38567 37969 37211 35505
   14    15    16
34049 33345 18019
..Working on block 1 .
    TOM calculation: adjacency..
adjacency: replaceMissing: 0
    ..will use 47 parallel threads.
     Fraction of slow calculations: 0.000000
    ..connectivity..
    ..matrix multiplication.."

The server stats show that WGCNA is using 3.5% of the memory, but only 1CPU - so it doesn't seem like the parallelisation is working - it would be great to use all 47 to get things moving.

Does anyone have any experience of running such a large dataset? And any tips on how to get it to use more threads? At this point, it looks as though things will take a very long time, even though there are a lot more cpu resources that could be used.

Thank you,

Caitlin

wgcna rnaseq • 3.3k views

ADD COMMENT • link updated 8.5 years ago by Peter Langfelder ★ 3.0k • written 8.5 years ago by caitlin ▴ 10

score 3 · Accepted Answer · 2016-08-14

Please see WGCNA FAQ, section General questions, items 1 and 2. Both preclustering and the actual TOM calculations benefit greatly from compiling R against a fast BLAS library.

The preclustering can take a long time with large data sets. In recent versions of WGCNA the preclustering defaults have been tweaked to allow faster execution. Please check that your WGCNA version is 1.51 or newer (the new version is available from CRAN, not yet from our own website, shame on me :S).

For blockwiseModules, the calculations are not (yet) parallelizable over blocks. In any case, a far better use of resources is to install use a fast BLAS.

Using a fast BLAS, a TOM calculation of 40+k variables should not take much more than 1 hour, even less on some hardware/BLAS combinations. I more or less routinely run WGCNA on Illumina 450 data (nearly 500k probes) with block sizes of up to 35k probes and the calculations of TOM certainly don't take that long.