Hiya,
Was just wanting to clarify my understanding of the WGCNA output as I have been reading various articles and have gotten confused- with the module-trait heatmap, if there is a positive correlation this means all the genes in the module have higher-expression when associated with the trait? so say if treated (1) and untreated (0), the genes have a higher expression when group has been treated than when untreated? i.e. as trait becomes greater (0 to 1), the expression becomes greater?
Best wishes,
Rebekah
Hi,
Cheers for the very rapid response! Yes it is a signed network. I was trying to work out, does the correlation relate to the fold changes between treated and untreated i.e. if positive correlation in module-trait heatmap = all genes are upregulated (higher expression) in that module when treatment is applied vs no-treatment. And is better then to use an unsigned network to see whether there are links between genes that have higher and lower expression when treatment is applied?
Best wishes,
Rebekah
Sorry wrong reply on wrong message - I meant to say, that I think I misunderstood. So the correlation is to do with the strength of the correlations between the nodes within a network. So negative correlation means with increasing trait value, the correlation between expression levels within the network weakens?
There are two different correlations (or sets of correlations) that you need to distinguish. The eigengene-trait correlation measures the strength and direction of association between the module (more precisely, the representative profile) and the trait. If this is positive (negative), it means the trait increases (decreases) with increasing eigengene "expression".
If this correlation is strong and the network is signed (or signed hybrid) it means that most of the genes in the module will also exhibit a correlation with the trait of the same sign as the eigengene. In an unsigned network, the gene-trait correlations can have the same or opposite sign.
Correlations among genes in a module are usually independent of whether the eigengene is correlated with a trait or not (and whether the correlation is positive or negative).
HTH,
Peter
Apologies for the late reply. We recommend using a signed or signed hybrid network analysis. You can read some more here:
http://www.peterlangfelder.com/signed-or-unsigned-which-network-type-is-preferable/ and
http://www.peterlangfelder.com/two-types-of-signed-networks-in-wgcna/
Peter
Sorry for the delay, I've been trying to get my head around the unsigned, signed and hybrid analysis.
So - unsigned - negative and positive correlations between expression values of the genes are all treated equally whether negative or positive?
- signed - negative correlations between expression values of the genes are assigned an adjacency still, but it is so small its negligible and therefore only positive correlations are really accounted for in the network output
- hybrid signed - only positive correlations between expression values of the genes are accounted for and all negative correlations are set to zero?
So in expression data where you are only interested in when expression on one gene increases with expression level of another you would use a signed network.
Then if a trait, e.g. length is negatively correlated with a module, and increase in the length would be associated with lower expression of all the genes within that module?
In data where negative correlation in expression between genes, e.g. homeostatic regulation? you should use unsigned.
Then if a trait, e.g. length is negatively correlated with a module, an increase in the length could be associated with higher or lower expression of genes, as the sign of the correlation of the gene with other gene expression within the module could be positive or negative?
You got it exactly right. I would only add that instead of running an unsigned analysis, I personally prefer to run a signed or signed hybrid, and then look at whether anti-correlated modules could be thought of as part of a single biological pathway/process. In my work, negatively correlated modules are usually biologically quite different.
cheers! :) took me a while. but glad I got there!
Sorry another quick question - in the tutorial document, https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf
the only place I can see signed specified is on the axis label of the soft threshold graph? Does this mean that the default is signed or should it be specified within the script i,e:
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5, networkType = "signed")
and then when calculating adjacency and TOM
adjacency = adjacency(datExpr, power = softPower, type="signed");
TOM = TOMsimilarity(adjacency, TOMType = "signed");
And where you have to specify the correlation, should this be done under TOM?
I have currently only specified spearman here as my variables are binary categorical:
"moduleTraitCor = cor(MEs, datTraits, method='spearman', use = "p");"
Should I also have specified spearmans under TOM too?
or should I have used the corType ="bicor" here when running blockwisemodules because spearman is not an option?
net = blockwiseModules(datExpr, power = 16, TOMType ="signed", type="signed",minModuleSize = 30,maxBlockSize=30000, corType="bicor", maxPOutliers = 0.1, reassignThreshold = 0, mergeCutHeight = 0.25,numericLabels = TRUE, pamRespectsDendro = FALSE,verbose = 3)
The default in WGCNA is, for historical reasons, unsigned network (at least for most functions). You need to check the help file for each function you want to use; most functions where network type matters will have an argument networkType or just type.
TOM type is not related to network type and only makes any difference for unsigned networks. For more details, please read this discussion: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/TechnicalReports/signedTOM.pdf
I would avoid using Spearman correlation unless there is a good reason to use it; in my experience, it leads to somewhat inferior results. TOM calculation directly from expression data (TOMsimilarityFromExpr) can only use Pearson or biweight midcorrelation; you have to calculate adjacency first (using Spearman correlation) and then turn it into TOM using TOMsimilarity().
Having binary traits is by itself not a reason to use Spearman correlation, you can use Pearson correlation or (with some care) the robust biweight midcorrelation.
Thank you for your response, I have just tried the bicor within the pick soft threshold and my soft threshold does not reach 0.85, so i will stick to pearsons - the reason I swapped to bicor was from the FAQ section of WGCNA recommending its use and against pearsons due to outlier sensitivity - but pearsons is okay to use then?
If using bicor - should the MaxPOutliers=0.05, and RobustY=FALSE only be used when correlating the module eigengene to the binary categorical trait? or should these be used when creating the modules too?
If a signed network has been used and some genes have a negative module membership - does this mean that they are negatively correlated with the eigengene (PC1) but are positively correlated with the genes to which their expression is correlated?
Apologies for all the questions, I'm just trying really hard to understand the theory behind what I am running in R.
Best wishes
R
Is choosing a significantly correlated module based on |cor|>0.8 and p<0.05 a valid step? Is there any other accepted threshold?
p<0.05 is always good, even better is to use a reasonable multiple testing correction for the p-value. In terms of correlation, it depends - a high correlation is always preferable to a low one, but there's no generally accepted threshold like the 0.05 for p-values.
Hello Peter, I got 104 modules when I did a Blockwise module detection. However, there were outliers so I had to remove them from my data. Also, the Soft-threshold came to be around 22. Do you think I should go ahead with the analysis? Is getting such a high number in both cases meaningful?
Below is the code:
net = blockwiseModules(datExpr.brain, power = 22, maxBlockSize = 10000, TOMType = "signed", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE verbose = 3)
Hello Peter, I got 104 modules when I did a Blockwise module detection. However, there were outliers so I had to remove them from my data. Also, the Soft-threshold came to be around 22. Do you think I should go ahead with the analysis? Is getting such a high number in both cases meaningful?
Below is the code:
net = blockwiseModules(datExpr.brain, power = 22, maxBlockSize = 10000, TOMType = "signed", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE verbose = 3)