Question

WGCNA: help with comparing multiple GEO studies

2

Entering edit mode

Abhishek Pratap ▴ 190

@abhishek-pratap-4927

Last seen 8.5 years ago

United States

Hi Steve and Peter My basic goal here is to study genetic similarities(if any) between a group of GEO studies. I have downloaded about 6-8 studies and as one would expect there is heterogeneity amongst them (diff platform, versions, study sizes(15 - 120 samples) etc). After initial step of normalization on each study I am trying to run a blockWiseConsensus analysis to see shared modules amongst these different studies. I am only using shared genes across all of the studies. 1. Wondering if doing consensus analysis across the studies is the right approach here. Intuitively I dont think I want to build modules on one study and compare with another as there are multiple studies for comparison. 2. Given varying samples sizes (15-120) I am not sure if I shud use a very high soft power given 2 studies have < 20 samples or shud I exclude these studies. 3. I have gone through tutorial II( Consensus analysis of female and male liver expression data) but it is not clear to me that once the network is built what are the different mechanisms in which one could look at the consensus modules across different studies and run functional enrichment analysis on them. Thanks! -Abhi

• 2.1k views

ADD COMMENT • link updated 10.3 years ago by Peter Langfelder ★ 3.0k • written 10.3 years ago by Abhishek Pratap ▴ 190

score 0 · Answer 1 · 2014-08-26

Hi Abishek, if you are interested in modules that appear in all (or most) of your data sets, you should run the consensus module analysis (e.g., blockwiseConsensusModules). At present the function has a bug which forces all soft-thresholding powers to be the same, but I will post the fix for it soon. The new version of blockwiseModules will also feature an option to use full quantile normalization of input networks to make them comparable, which should be more appropriate than the simple single-quantile scaling used at present. I would think carefully about excluding small-size studies - this may be appropriate if you have a big study in the same or very similar conditions and you trust the big study. But if the small studies are credible and there are no big studies in the same conditions, you can keep them. I would make sure that all of your input data sets are carefully pre-processed, extreme outliers are removed, and probe sets are summarized to gene-level data. You will need to restrict all data sets to the same genes. Best, Peter On Fri, Aug 22, 2014 at 7:39 PM, Abhishek Pratap <abhishek.vit at="" gmail.com=""> wrote: > Hi Steve and Peter > > My basic goal here is to study genetic similarities(if any) between a > group of GEO studies. I have downloaded about 6-8 studies and as one > would expect there is heterogeneity amongst them (diff platform, > versions, study sizes(15 - 120 samples) etc). After initial step of > normalization on each study I am trying to run a blockWiseConsensus > analysis to see shared modules amongst these different studies. I am > only using shared genes across all of the studies. > > 1. Wondering if doing consensus analysis across the studies is the > right approach here. Intuitively I dont think I want to build modules > on one study and compare with another as there are multiple studies > for comparison. > > 2. Given varying samples sizes (15-120) I am not sure if I shud use a > very high soft power given 2 studies have < 20 samples or shud I > exclude these studies. > > 3. I have gone through tutorial II( Consensus analysis of female and > male liver expression data) but it is not clear to me that once the > network is built what are the different mechanisms in which one could > look at the consensus modules across different studies and run > functional enrichment analysis on them. > > > Thanks! > -Abhi