I have a research problem that I want to solve. Basically, I want to find important genes per module that is generated from WGCNA algorithm.
I define important genes as hub genes. Hub genes is defined as genes that have most connectivity. I read from some paper that basically the calculation is to sum all the weight of each node, sort it from the highest to smallest, and select top 1%,5%, or 10%.
After I have list of important genes, I need to find modules in the network generated from WGCNA, and map the hub genes to the modules. That way, I will have modules and important genes per module data.
To do that, I try the basic WGCNA tutorial because I am new to this package. I have followed WGCNA tutorial from : https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/.
In tutorial 2b, I have followed until calculation of Topological Matrix (TOM). Below is the code:
softPower = 6; adjacency = adjacency(datExpr, power = softPower); TOM = TOMsimilarity(adjacency); dissTOM = 1-TOM
It seems this part is where the network is generated in the form of adjacency matrix.
My questions are:
1. Which matrix is used for calculation of hub genes? The adjaceny or dissTOM? I checked that dissTOM matrix contains number above 0.99. Is it right?
2. Is it better make some cutoff with a threshold to determine whether 2 genes are connected first before calculating the weight sum for determining hub genes? If a pair of gene has weight less than cutoff, I set it to 0, otherwise I set it to 1. That way, I just need to calculate how many 1 to determine the hub genes.
Thank you very much.