Question

Breaking down the output from plotEigengeneNetworks()

0

Entering edit mode

l.s.wijaya • 0

@lswijaya-21856

Last seen 5.0 years ago

Hi All,

I have a question. How can one breakdown the output for plotEigengeneNetworks() function? In the tutorial, it shows how one can examine the adjacency of a clinical trait with modules. The map shows pretty clear and informative results.

https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-05-Visualization.pdf

However, in my case, I have 2 different traits and 269 modules in which it produces a big heatmap. Is it possible to extract the dataframe containing only the adjacency score then based on that I can make a heatmap only containing my clinical trait as y-axis and the modules on the other axis so that the heatmap will be more understandable?? Thanks.

WGCNA Heatmap EigenGeneNetworks • 2.3k views

ADD COMMENT • link 5.5 years ago l.s.wijaya • 0

0

Entering edit mode

With only two traits and an absurdly high number of modules,269, I suspect that most of the modules are just noise. Therefore, before you proceed with the analysis I would increase the minimum module size to 100 and use a cutHeight of 0.95 when you call the function cutreeDynamic().

Also, did you filter out low count genes/transcripts from the dataset?

ADD REPLY • link 5.5 years ago andres.firrincieli ▴ 50

0

Entering edit mode

Thanks for the reply. Yes I did filter out the transcript and the genes from the dataset with low count and NA in more than 20% of the samples. FYI, I have around 200 conditions in my sample and I actually expect the gene to be clustered in more than 100 modules. I think for the cutHeight from cutreeDynamic(), it is default 0.99 and I don't change the number. Btw, how do I check computationally if the modules are noise or not? I mean without checking manually from the response from condition to condition?

ADD REPLY • link 5.5 years ago l.s.wijaya • 0

0

Entering edit mode

200 conditions per sample? Could you elaborates because this is getting 'weird'?

For me a 'noisy' module is: 1) a module whose expression profile does not agree with the experimental design; 2) a module where every gene/transcript has negative kDiff, i.e., KOut>KIn. For a fast screening I look at the expression profile of each module:

library(gplots)
col = colorpanel(300, 'purple','black','yellow')
colorsA1 = names(table(moduleColors))
pdf("Heatmap.pdf",height=9,width=9)
for (c in 1:length(colorsA1)){
      moduleColors == colorsA1[c]
      heatmap.2(t(datExpr[moduleColors==colorsA1[c]]), scale = "row", col=col, density.info ="none", trace="none", cexCol=0.5, cexRow=0.8, margin=c(19,11), main = colorsA1[c], Colv = FALSE)
      }; dev.off()

Alternatively, you could use clusterRepro, but I am not 100% sure about this because you probably need a test and a reference dataset.

I never dealt with more than 18-20 modules, so in you case i would check only those modules whose ME significantly correlate with your traits/experimental variables; if you followed the tutorial, such information should be stored in two separate R object: moduleTraitCor and moduleTraitPvalue. You can merge the two files and filter out those modules with pval > 0.01.

Finally, did you try to merge those modules with similar expression profile: section 2.b.5

ps. print the Heatmap.pdf only when you will able to reduce the number of modules, because with 260 modules the file will be quite heavy

ADD REPLY • link 5.5 years ago andres.firrincieli ▴ 50

0

Entering edit mode

Thanks for your suggestion. What do you mean by KDiff? For some modules for instance, I got quite logical relation with my experimental design. For instance, the modules containing genes that are suspected to be upregulated in the certain condition, show high eigengene scores. A question from the script you suggest for the heatmap, so the heatmap will contain the log2FC of each gene inside the module in every condition, right? Then if I apply this heatmap to my condition, I will have 260 file to check. Yes, the 260 modules are after merging, but I think it's a good idea to check the moduleTraitPvalue once I include the clinical traits. To make it clear, I try to apply the WGCNA similar to this paper : https://www.nature.com/articles/tpj201717 Here, they have around 400 modules. Also, another limitation from my data, I choose my sft quite high (10) because at that threshold, I get > 0.8 R^2. Do you think this is normal? Thanks again for your suggestion.

ADD REPLY • link 5.5 years ago l.s.wijaya • 0

0

Entering edit mode

What do you mean by KDiff?

The function intramodularConnectivity() calculate the connectivity of nodes to other nodes within the same module (kWithin), outside the module (kOut) and the overall connectivity within the network (kTot). The function also return the kDiff (kWithin - kOut); in general, the hub genes in each module should have positive kDiff. If that is not the case, there is a chance that the module with negative kDiff is part of a bigger cluster. From my experience, very small modules tends to have negative kDiff, even when they show a significant correlation with my trait of interest, or the expression profile is in agreement with the experimental design.

A question from the script you suggest for the heatmap, so the heatmap will contain the log2FC of each gene inside the module in every condition, right?

not the log2FC but expression profile (scaled) of your genes within each module across all your conditions.

I will have 260 file to check.

No, just a .pdf file of 260 page

I choose my sft quite high (10) because at that threshold, I get > 0.8 R^2. Do you think this is normal?

It should not be a problem

ADD REPLY • link 5.5 years ago andres.firrincieli ▴ 50

0

Entering edit mode

Understand. Thanks for all the answers.

ADD REPLY • link 5.5 years ago l.s.wijaya • 0