WGCNA sample size minimum: why?
1
3
Entering edit mode
@charlesfoster-17652
Last seen 14 days ago
Australia

Hi all,

While looking into WGCNA analysis, I saw that the minimum recommended sample size is 15 samples because:

correlations on fewer than 15 samples will simply be too noisy for the network to be biologically meaningful

I'm wondering if anyone here would be able to further clarify why this is the case. In my attempts to understand this question, I've thought of two possibilities that partially overlap...

(1) Having <15 samples invalidates conclusions because results might be spuriously driven by one or a couple of replicates

(2) Having ≥15 samples is suggested because a smaller number might not have enough power to detect any biological trends, i.e. module eigengenes won't have any underlying biological significance

Are either, or both, of these thoughts correct?

Out of interest, I ran a WGCNA analysis on a data set of 12 samples. I first recovered module eigengenes, and then correlated these with three binary traits. The results are entirely reasonable, with the most highly correlated module for each trait telling an interesting biological story that reflects standard differential expression + GO enrichment analysis. I should note that the data are heterogeneous, with differences in expression between samples wholly reflecting the traits of interest (as per point 5 here: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html). I'm guessing that this might be why results appear to be sensible: the data are highly informative without noise swamping the biological signals that we are interested in. In this case, I'd be inclined to say that the rule of thumb of 15 samples might not matter?

Any comments would be appreciated :)

– C

wgcna • 6.9k views
ADD COMMENT
0
Entering edit mode

Hi,

I am also planning to perform WGCNA analysis and was not sure how I could correlate my modules with binary traits (0,1) i.e. how to prepare an input file of binary traits. Would it be possible to share your R-code and binary input file as an example? Thank you!

ADD REPLY
0
Entering edit mode

Hi, could I ask you wahat is the source of your first statement that suggests to have at least 15 samples?

ADD REPLY
4
Entering edit mode
@peter-langfelder-4469
Last seen 7 weeks ago
United States

If you have strong signal and clean data, yes, WGCNA could be informative even with 12 samples. The worst that can happen (with few samples, strong signal and fairly clean data) is that WGCNA won't give you insights that you could not gain from a plain DE analysis. Many of the finer-grain results of WGCNA (e.g., picking hub genes) become less reliable with fewer samples, and 15 seems like a good number to draw a generic line. Sometimes you can get decent results with 10 samples, and sometimes a 20-sample (or bigger) WGCNA won't provide any good insights.

ADD COMMENT
0
Entering edit mode

Great, thanks for the fast clarification!

ADD REPLY

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6