I have 20 case and 20 control samples for which I had to perform WGCNA. Initial PCA and hierarchical clustering of samples showed the existence of batch (expected) and was removed using limma::removeBatchEffect()
for DESeq2 normalization. On performing WGCNA, I obtained 12 modules with one single module containing almost half of the genes in analysis. A few forum post suggested that this might be due to presence of strong driver of variation (ref1). As the known source of batch was correctly removed, I wanted to try on RUVseq or SVA to remove unwanted source of variation or hidden batches. The tutorial of DESeq2 and RUV-seq explains for how to account for the hidden variation in the differential testing, but how can we obtain the normalised counts for WGCNA.
Before you run RUVseq or SVA, did you check if the largest module significantly correlate with your main conditions i.e., case vs control?
Yes. It does negatively correlate with condition. For module-trait relationship analysis, I replaced case and control with numeric 1 and 2 respectively. Is this the correct way. I do not have any other trait data except this. As prior to WGCNA I had identified DEGs using DESeq2, I found the DEGs spread across different modules. Is this usual? I had expected all the up-reg genes to occur within a single module and same for down-reg genes.
For a binary matrix you should use only 1 and 0 instead of 1 and 2.
I will try your suggestion. But does it have any impact? I use the following code to transform character levels to trait.
Updated with:
as.numeric.factor <- function(x){as.numeric(x)-1}