Dear all,
I'm analyzing microarray data using wgcna. I have 36 paired samples from 18 persons.Each person has two samples, one before a 6 week endurance training program,and one after the training program. We're investigating the exercise effects on gene expression using wgcna. We want to find modules that are differentially expressed before and after the training program(eg: MEs that have high values before exercise and low values after exercise).
Here is the question, I thought I might have several ways to do it:
1. Split the whole dataset accroding to exercise status into two 18-sample datasets, then perform wgcna to find consensus modules in the two datasets, compute consensus MEs in each dataset, and find which ME is significantly different in the two datasets using paired-t test.
2. Just handle the 36 samples as a whole, and find MEs that is significantly different before and after exercise using paired-t-test.
3. For each person, compute the gene expression ratio( ratio=value before exercise/value after exercise), and use the ratio as wgcna input.
I would like to know: how to best utilize the power of paired design for co-expression analysis? And I want to figure out the flaws and strength of each kind of design above.
For the second design, to handle 36 samples as a whole, I'm wondering if these paired designs would interfere with the computing of correlation coefficients, and thus lead to unreal outcome.
Is this a crazy idea? I would think that certain genes with low basal expression values and low module membership overall, can be recruited by an experimental condition to form a new module... reading this post made me doubt of such assumption
It could happen but it is in my experience not very common, and it is more difficult to detect.
I’ve read in other posts that a strong driver of gene expression could reduce the number of modules. Trying to put that together with what you just said, I am thinking that in the case of low numbers of modules due to a strong biological relevant driver the modules will still be there, but the algorithm will not be able to separate them. Thus, I will only be able to detect the metamodule. If this is true, it will make sense to use a reference network to study module-phenotype correlations when a strong gene expression driver is present, instead of building a network with those data. Does this make sense? Thanks!
You could do that, just be aware that module eigengenes may not be meaningful since the genes in each module are not necessarily correlated. My experience is that when you have strong driver(s), the eigengene of pretty much any large-enough random group of genes will be strongly associated with the strong driver.
I usually try a consensus analysis of the data with the strong driver and a data set where the driver is absent, which I usually obtain by simply regressing the strong driver out of the data, but you could also try using some unrelated reference data. This will typically yield multiple smaller modules that are coexpressed in both the driver-driven data and the other data, so eigengenes are good representatives and can be used to study association with traits.