Question

WGCNA with paired samples

4

Entering edit mode

2323982403 ▴ 60

@2323982403-11356

Last seen 3.8 years ago

Ann Arbor

Dear all,

I'm analyzing microarray data using wgcna. I have 36 paired samples from 18 persons.Each person has two samples, one before a 6 week endurance training program,and one after the training program. We're investigating the exercise effects on gene expression using wgcna. We want to find modules that are differentially expressed before and after the training program（eg: MEs that have high values before exercise and low values after exercise）.

Here is the question, I thought I might have several ways to do it：

1. Split the whole dataset accroding to exercise status into two 18-sample datasets, then perform wgcna to find consensus modules in the two datasets, compute consensus MEs in each dataset, and find which ME is significantly different in the two datasets using paired-t test.

2. Just handle the 36 samples as a whole, and find MEs that is significantly different before and after exercise using paired-t-test.

3. For each person, compute the gene expression ratio( ratio=value before exercise/value after exercise), and use the ratio as wgcna input.

I would like to know: how to best utilize the power of paired design for co-expression analysis? And I want to figure out the flaws and strength of each kind of design above.

For the second design, to handle 36 samples as a whole, I'm wondering if these paired designs would interfere with the computing of correlation coefficients, and thus lead to unreal outcome.

wgcna paired samples exercise • 3.8k views

ADD COMMENT • link updated 8.5 years ago by Peter Langfelder ★ 3.0k • written 8.5 years ago by 2323982403 ▴ 60

score 5 · Answer 1 · 2016-08-25

5

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 3 months ago

United States

From the point of view of WGCNA, it is best to use either approaches 2 or 3. Each has advantages and drawbacks. If the between-subject variability is stronger than the measurement noise in the data, you may be better of with approach 3, but if the between-subject variability is relatively low, it may be better to go with approach 2. I don't see how approach 1 would be useful in answering your question except if you had a hypothesis that the network organization is different pre- and post-training.

Paired designs do not interfere with WGCNA's calculations - but WGCNA, in its simplest form, is not designed to take advantage of the paired design.

Peter

ADD COMMENT • link 8.5 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

I don't see how approach 1 would be useful in answering your question except if you had a hypothesis that the network organization is different pre- and post-training.

Is this a crazy idea? I would think that certain genes with low basal expression values and low module membership overall, can be recruited by an experimental condition to form a new module... reading this post made me doubt of such assumption

ADD REPLY • link 5.7 years ago agustin.gonvi ▴ 20

1

Entering edit mode

It could happen but it is in my experience not very common, and it is more difficult to detect.

ADD REPLY • link 5.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

I’ve read in other posts that a strong driver of gene expression could reduce the number of modules. Trying to put that together with what you just said, I am thinking that in the case of low numbers of modules due to a strong biological relevant driver the modules will still be there, but the algorithm will not be able to separate them. Thus, I will only be able to detect the metamodule. If this is true, it will make sense to use a reference network to study module-phenotype correlations when a strong gene expression driver is present, instead of building a network with those data. Does this make sense? Thanks!

ADD REPLY • link 5.7 years ago agustin.gonvi ▴ 20

1

Entering edit mode

You could do that, just be aware that module eigengenes may not be meaningful since the genes in each module are not necessarily correlated. My experience is that when you have strong driver(s), the eigengene of pretty much any large-enough random group of genes will be strongly associated with the strong driver.

I usually try a consensus analysis of the data with the strong driver and a data set where the driver is absent, which I usually obtain by simply regressing the strong driver out of the data, but you could also try using some unrelated reference data. This will typically yield multiple smaller modules that are coexpressed in both the driver-driven data and the other data, so eigengenes are good representatives and can be used to study association with traits.

ADD REPLY • link 5.7 years ago Peter Langfelder ★ 3.0k