Question

WGCNA: Understanding module-trait correlations

3

Entering edit mode

bekah ▴ 40

@bekah-12633

Last seen 6.2 years ago

Hiya,

Was just wanting to clarify my understanding of the WGCNA output as I have been reading various articles and have gotten confused- with the module-trait heatmap, if there is a positive correlation this means all the genes in the module have higher-expression when associated with the trait? so say if treated (1) and untreated (0), the genes have a higher expression when group has been treated than when untreated? i.e. as trait becomes greater (0 to 1), the expression becomes greater?

Best wishes,

Rebekah

wgcna package • 22k views

ADD COMMENT • link updated 6.7 years ago by Peter Langfelder ★ 3.0k • written 6.7 years ago by bekah ▴ 40

score 4 · Answer 1 · 2018-07-27

4

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

Your question is not very clear but I'll try anyway. The module-trait heatmap usually represents the correlations of the module eigengenes with traits. When that correlation is high, it means the eigengene increases with increasing trait. In a signed network (where all genes in a module are positively correlated with the eigengene) it will mean that (again if the eigengene-trait correlation is high) pretty much all genes should also follow the same pattern of increasing expression with increasing trait values. In an unsigned network you may also have genes that have the opposite behaviour since in an unsigned network a module can contain also genes strongly negatively correlated with the eigengene.

Hope this helps.

ADD COMMENT • link 6.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Hi,

Cheers for the very rapid response! Yes it is a signed network. I was trying to work out, does the correlation relate to the fold changes between treated and untreated i.e. if positive correlation in module-trait heatmap = all genes are upregulated (higher expression) in that module when treatment is applied vs no-treatment. And is better then to use an unsigned network to see whether there are links between genes that have higher and lower expression when treatment is applied?

Best wishes,

Rebekah

ADD REPLY • link 6.7 years ago bekah ▴ 40

0

Entering edit mode

Sorry wrong reply on wrong message - I meant to say, that I think I misunderstood. So the correlation is to do with the strength of the correlations between the nodes within a network. So negative correlation means with increasing trait value, the correlation between expression levels within the network weakens?

ADD REPLY • link 6.7 years ago bekah ▴ 40

0

Entering edit mode

There are two different correlations (or sets of correlations) that you need to distinguish. The eigengene-trait correlation measures the strength and direction of association between the module (more precisely, the representative profile) and the trait. If this is positive (negative), it means the trait increases (decreases) with increasing eigengene "expression".

If this correlation is strong and the network is signed (or signed hybrid) it means that most of the genes in the module will also exhibit a correlation with the trait of the same sign as the eigengene. In an unsigned network, the gene-trait correlations can have the same or opposite sign.

Correlations among genes in a module are usually independent of whether the eigengene is correlated with a trait or not (and whether the correlation is positive or negative).

HTH,

Peter

ADD REPLY • link 6.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Apologies for the late reply. We recommend using a signed or signed hybrid network analysis. You can read some more here:

http://www.peterlangfelder.com/signed-or-unsigned-which-network-type-is-preferable/ and

http://www.peterlangfelder.com/two-types-of-signed-networks-in-wgcna/

Peter

ADD REPLY • link 6.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Sorry for the delay, I've been trying to get my head around the unsigned, signed and hybrid analysis.
So - unsigned - negative and positive correlations between expression values of the genes are all treated equally whether negative or positive?
- signed - negative correlations between expression values of the genes are assigned an adjacency still, but it is so small its negligible and therefore only positive correlations are really accounted for in the network output
- hybrid signed - only positive correlations between expression values of the genes are accounted for and all negative correlations are set to zero?

So in expression data where you are only interested in when expression on one gene increases with expression level of another you would use a signed network.
Then if a trait, e.g. length is negatively correlated with a module, and increase in the length would be associated with lower expression of all the genes within that module?

In data where negative correlation in expression between genes, e.g. homeostatic regulation? you should use unsigned.
Then if a trait, e.g. length is negatively correlated with a module, an increase in the length could be associated with higher or lower expression of genes, as the sign of the correlation of the gene with other gene expression within the module could be positive or negative?

ADD REPLY • link 6.7 years ago bekah ▴ 40

3

Entering edit mode

You got it exactly right. I would only add that instead of running an unsigned analysis, I personally prefer to run a signed or signed hybrid, and then look at whether anti-correlated modules could be thought of as part of a single biological pathway/process. In my work, negatively correlated modules are usually biologically quite different.

ADD REPLY • link 6.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

cheers! :) took me a while. but glad I got there!

ADD REPLY • link 6.7 years ago bekah ▴ 40

0

Entering edit mode

Sorry another quick question - in the tutorial document, https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf

the only place I can see signed specified is on the axis label of the soft threshold graph? Does this mean that the default is signed or should it be specified within the script i,e:

sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5, networkType = "signed")

and then when calculating adjacency and TOM

adjacency = adjacency(datExpr, power = softPower, type="signed");

TOM = TOMsimilarity(adjacency, TOMType = "signed");

And where you have to specify the correlation, should this be done under TOM?

I have currently only specified spearman here as my variables are binary categorical:
"moduleTraitCor = cor(MEs, datTraits, method='spearman', use = "p");"

Should I also have specified spearmans under TOM too?

or should I have used the corType ="bicor" here when running blockwisemodules because spearman is not an option?
net = blockwiseModules(datExpr, power = 16, TOMType ="signed", type="signed",minModuleSize = 30,maxBlockSize=30000, corType="bicor", maxPOutliers = 0.1, reassignThreshold = 0, mergeCutHeight = 0.25,numericLabels = TRUE, pamRespectsDendro = FALSE,verbose = 3)

ADD REPLY • link 6.7 years ago bekah ▴ 40

2

Entering edit mode

The default in WGCNA is, for historical reasons, unsigned network (at least for most functions). You need to check the help file for each function you want to use; most functions where network type matters will have an argument networkType or just type.

TOM type is not related to network type and only makes any difference for unsigned networks. For more details, please read this discussion: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/TechnicalReports/signedTOM.pdf

I would avoid using Spearman correlation unless there is a good reason to use it; in my experience, it leads to somewhat inferior results. TOM calculation directly from expression data (TOMsimilarityFromExpr) can only use Pearson or biweight midcorrelation; you have to calculate adjacency first (using Spearman correlation) and then turn it into TOM using TOMsimilarity().

Having binary traits is by itself not a reason to use Spearman correlation, you can use Pearson correlation or (with some care) the robust biweight midcorrelation.

ADD REPLY • link 6.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you for your response, I have just tried the bicor within the pick soft threshold and my soft threshold does not reach 0.85, so i will stick to pearsons - the reason I swapped to bicor was from the FAQ section of WGCNA recommending its use and against pearsons due to outlier sensitivity - but pearsons is okay to use then?

If using bicor - should the MaxPOutliers=0.05, and RobustY=FALSE only be used when correlating the module eigengene to the binary categorical trait? or should these be used when creating the modules too?

If a signed network has been used and some genes have a negative module membership - does this mean that they are negatively correlated with the eigengene (PC1) but are positively correlated with the genes to which their expression is correlated?

Apologies for all the questions, I'm just trying really hard to understand the theory behind what I am running in R.

Best wishes

R

ADD REPLY • link 6.7 years ago bekah ▴ 40

0

Entering edit mode

Is choosing a significantly correlated module based on |cor|>0.8 and p<0.05 a valid step? Is there any other accepted threshold?

ADD REPLY • link 5.3 years ago Arindam ▴ 80

0

Entering edit mode

p<0.05 is always good, even better is to use a reasonable multiple testing correction for the p-value. In terms of correlation, it depends - a high correlation is always preferable to a low one, but there's no generally accepted threshold like the 0.05 for p-values.

ADD REPLY • link 5.3 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Hello Peter, I got 104 modules when I did a Blockwise module detection. However, there were outliers so I had to remove them from my data. Also, the Soft-threshold came to be around 22. Do you think I should go ahead with the analysis? Is getting such a high number in both cases meaningful?

Below is the code:

net = blockwiseModules(datExpr.brain, power = 22, maxBlockSize = 10000, TOMType = "signed", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE verbose = 3)

ADD REPLY • link 5.1 years ago Ramz • 0

0

Entering edit mode

Hello Peter, I got 104 modules when I did a Blockwise module detection. However, there were outliers so I had to remove them from my data. Also, the Soft-threshold came to be around 22. Do you think I should go ahead with the analysis? Is getting such a high number in both cases meaningful?

Below is the code:

net = blockwiseModules(datExpr.brain, power = 22, maxBlockSize = 10000, TOMType = "signed", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE verbose = 3)

ADD REPLY • link 5.1 years ago Ramz • 0