I am working on methylation data and I want to use WGCNA on them. The data are the beta values of 400K CpG probes and ~30 samples which were transformed by log2(beta+1) (The data then have a nice scale-free topology profile when the soft threshold is calculated, I think it is a good sign? ). So, if someone already use WGCNA for this kind of data, I would have some questions: Firstly, are my data properly normalized? And secondly, how can I choose the maximal number of blocks for the blockWiseModule function ? I have access to a cluster, can I deduct the maxBlockSize when I know the available RAM?
I have done a few WGCNA analyses on Illumina 450k methylation data. I used beta values and filtered the data down to about 300k most variant probes. I haven't thought deeply about which transformation is best for methylation data but using beta values worked fairly well. I would not use a log2(beta +1) transformation since it will exaggerate differences in CpGs with low methylation; this transformation is useful for count data.
maxBlockSize indeed depends mainly on available RAM. I would say you should be able to use maxBlockSize around 30k if you have 64GB of RAM, perhaps even 32GB. Increase the maxBlockSize by a factor of 1.4 for every doubling of RAM; with 256GB you should be able to use block size of ~60k, 1TB would let you use ~120k etc. You can play with the block sizes to some degree; in my experience the actual maximum block size usable on a system depends also on how the system is configured, how much swap space there is etc.
Thanks a lot for your quick reply and your great job on WGCNA !
After several tests, I used a maxBlockSize of 60K with 128GB of RAM with the beta values. It worked out fine :)
Enora