Fwd: WGCNA: labeledHeatmap and verboseScatterplot correlations and p-values are not the same

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

Hi, the correlations are different because the two functions, which are just visualization functions, visualize different relationships. In the heatmap you see correlations of module eigengenes with the trait, while the verboseScatterplot probably shows the scatterplot of individual gene signifcances vs. their module membership. Hence the number of observations in the heatmap is your number of samples, whereas in the scatterplot the number of observation is the number of genes in your module, which is typically larger. HTH, Peter On Fri, Aug 8, 2014 at 4:40 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote: > Hello! > > I have used the WGCNA package and found 2 modules which are of interest for > me. I picked out from the labeledHeatmap (which related the module eigengene > with clinical traits). I then selected the modules (one at a time) and ran > the verboseScatterplot function. > > For one module its correlation was 0.51, p = 0.01 in labeledHeatmap, but > 0.27, p<0.0001 in the verboseScatterplot. > How can this be? > > > > Thank you. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

Visualization Visualization • 3.8k views

ADD COMMENT • link 10.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Sindre ▴ 70

@sindre-6693

Last seen 8.2 years ago

Thank you! I understand now, its actually well described on the webpage and in the documentation, I should have used more time reading before posting. If you have the time, I actually have a few more questions: 1. Why is the module size set at min. 30? What will the implications be if larger/smaller? 2. Is it always better to use the step-by-step network construction and module detection? Let me give you some details of my design so you understand why I ask. I have two time points, from the same persons, before and after an intervention. My immediate idea was to run WGCNA on the time point 1, relating it to body weight. Then I run WGCNA on time point 2, seeing if the same modules pops up. Finally, I run WGCNA on the log-ratio ( time point 2- time point 1 ) against log-ratio of body weight. If one or more modules shows up in all three runs, they will be prioritised. Now, if I only use the one-step approach, no modules show up in all three comparisons, while using step-by-step looks more promising. However, if I mix the results, using one-step on time point 1, but step-by-step on time point 2, its even more interesting. Thank you very much! On 2014-08-09 18:30, Peter Langfelder wrote: > Hi, > > the correlations are different because the two functions, which are > just visualization functions, visualize different relationships. In > the heatmap you see correlations of module eigengenes with the trait, > while the verboseScatterplot probably shows the scatterplot of > individual gene signifcances vs. their module membership. Hence the > number of observations in the heatmap is your number of samples, > whereas in the scatterplot the number of observation is the number of > genes in your module, which is typically larger. > > HTH, > > Peter > > On Fri, Aug 8, 2014 at 4:40 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> > wrote: >> Hello! >> >> I have used the WGCNA package and found 2 modules which are of >> interest for >> me. I picked out from the labeledHeatmap (which related the module >> eigengene >> with clinical traits). I then selected the modules (one at a time) >> and ran >> the verboseScatterplot function. >> >> For one module its correlation was 0.51, p = 0.01 in labeledHeatmap, >> but >> 0.27, p<0.0001 in the verboseScatterplot. >> How can this be? >> >> >> >> Thank you. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best regards Sindre Lee Medical Research Student Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo POB 1046, Blindern, 0317 Oslo, Norway Visiting address: Sognsvannsveien 9, Domus Medica Mobile phone: +47 46796851 E-mail: sindre.lee at medisin.uio.no; Web page: http://www.med.uio.no/imb/personer/vit/sindrle/index.html

ADD COMMENT • link 10.7 years ago Sindre ▴ 70

0

Entering edit mode

Hi Sindre, please see inline. On Sat, Aug 9, 2014 at 9:47 AM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote: > Thank you! I understand now, its actually well described on the webpage and > in the documentation, I should have used more time reading before posting. > > If you have the time, I actually have a few more questions: > > 1. Why is the module size set at min. 30? What will the implications be if > larger/smaller? I feel that 30 is a good compromise between robust large modules and possibly informative but not as robust small modules. You can certainly adjust the minimum size, although I rarely find modules below 30 genes interesting. The implication is very simple - smaller minimum module size will lead to more modules, but whether this provides more biological information or just more noise is highly situation-dependent. > > 2. Is it always better to use the step-by-step network construction and > module detection? The step-by-step gives you more options to tweak the procedure or insert your custom code between the steps. Otherwise they are (nearly) identical. I say nearly because the blockwise... functions include an extra step of removing peripheral genes from modules. > Let me give you some details of my design so you > understand why I ask. I have two time points, from the same persons, before > and after an intervention. My immediate idea was to run WGCNA on the time > point 1, relating it to body weight. Then I run WGCNA on time point 2, > seeing if the same modules pops up. Finally, I run WGCNA on the log- ratio ( > time point 2- time point 1 ) against log-ratio of body weight. If one or > more modules shows up in all three runs, they will be prioritised. Now, if I > only use the one-step approach, no modules show up in all three comparisons, > while using step-by-step looks more promising. However, if I mix the > results, using one-step on time point 1, but step-by-step on time point 2, > its even more interesting. If you want to see modules that pop up in all 3 data sets, use the consensus module approach. But it's not clear to me that looking for modules in data set 1, data set 2, and the log-ratio of the two makes sense. You could get consensus modules across sets 1 and 2, and then see if the log-ratio of interesting modules is still associated with the log-ratio of body weight. Peter

ADD REPLY • link 10.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you for your answers! Please see inline for a final comment. On 2014-08-11 19:47, Peter Langfelder wrote: > Hi Sindre, > > please see inline. > > On Sat, Aug 9, 2014 at 9:47 AM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> > wrote: >> Thank you! I understand now, its actually well described on the >> webpage and >> in the documentation, I should have used more time reading before >> posting. >> >> If you have the time, I actually have a few more questions: >> >> 1. Why is the module size set at min. 30? What will the implications >> be if >> larger/smaller? > > I feel that 30 is a good compromise between robust large modules and > possibly informative but not as robust small modules. You can > certainly adjust the minimum size, although I rarely find modules > below 30 genes interesting. The implication is very simple - smaller > minimum module size will lead to more modules, but whether this > provides more biological information or just more noise is highly > situation-dependent. > >> >> 2. Is it always better to use the step-by-step network construction >> and >> module detection? > > The step-by-step gives you more options to tweak the procedure or > insert your custom code between the steps. Otherwise they are (nearly) > identical. I say nearly because the blockwise... functions include an > extra step of removing peripheral genes from modules. > > > >> Let me give you some details of my design so you >> understand why I ask. I have two time points, from the same persons, >> before >> and after an intervention. My immediate idea was to run WGCNA on the >> time >> point 1, relating it to body weight. Then I run WGCNA on time point >> 2, >> seeing if the same modules pops up. Finally, I run WGCNA on the >> log-ratio ( >> time point 2- time point 1 ) against log-ratio of body weight. If one >> or >> more modules shows up in all three runs, they will be prioritised. >> Now, if I >> only use the one-step approach, no modules show up in all three >> comparisons, >> while using step-by-step looks more promising. However, if I mix the >> results, using one-step on time point 1, but step-by-step on time >> point 2, >> its even more interesting. > > > If you want to see modules that pop up in all 3 data sets, use the > consensus module approach. But it's not clear to me that looking for > modules in data set 1, data set 2, and the log-ratio of the two makes > sense. You could get consensus modules across sets 1 and 2, and then > see if the log-ratio of interesting modules is still associated with > the log-ratio of body weight. Ok, I will try that, thank you! My thought was that if the module was related weight (in two data sets) and changes (log-ratio) in weight, wouldn't that point to a stronger relationship? If not, why?

ADD REPLY • link 10.7 years ago Sindre ▴ 70

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

On Mon, Aug 11, 2014 at 1:26 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote: > My thought was that if the module was related weight (in two data sets) and > changes (log-ratio) in weight, wouldn't that point to a stronger > relationship? If not, why? You can think about the log-ratio as conditioning out the denominator (e.g., time 1 if you do log(time 2/time 1). If you had a strong signal at time 1, you will take the signal out; if the signal at time 2 was similar to that at time 1 and you take the time-1 signal out, you're left with no signal (relating to weight). For gene expressions you're not only taking out their relationship to weight, you also remove their correlation at time 1 - if the correlations at time 2 were similar, you will again be left with data whose correlation structure is very different from the original, so you most likely won't observe the same modules again. Peter

ADD COMMENT • link 10.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

On 2014-08-11 23:00, Peter Langfelder wrote: > On Mon, Aug 11, 2014 at 1:26 PM, Sindre Lee > <sindre.lee at="" medisin.uio.no=""> wrote: > >> My thought was that if the module was related weight (in two data >> sets) and >> changes (log-ratio) in weight, wouldn't that point to a stronger >> relationship? If not, why? > > You can think about the log-ratio as conditioning out the denominator > (e.g., time 1 if you do log(time 2/time 1). If you had a strong signal > at time 1, you will take the signal out; if the signal at time 2 was > similar to that at time 1 and you take the time-1 signal out, you're > left with no signal (relating to weight). For gene expressions you're > not only taking out their relationship to weight, you also remove > their correlation at time 1 - if the correlations at time 2 were > similar, you will again be left with data whose correlation structure > is very different from the original, so you most likely won't observe > the same modules again. > > Peter Thank you for a nice explanation! I have to re-think what my approach should be then. I want to find weight-related genes and then find out how my intervention effects weight through these genes. So if I want to answer the question "Which genes are changed during intervention and associated the change in weight?", what would you suggest as an approach? Thank you again, this has been very enlightening for me.

ADD REPLY • link 10.7 years ago Sindre ▴ 70

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

On Mon, Aug 11, 2014 at 3:11 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote: > > I have to re-think what my approach should be then. I want to find > weight-related genes and then find out how my intervention effects weight > through these genes. You may want to speak to a local statistician who can offer better advice than me over email. > > So if I want to answer the question "Which genes are changed during > intervention and associated the change in weight?", what would you suggest > as an approach? I don't know what your experimental design is, what are the controls - is it time 1, or is time 1 baseline, time 2 after treatment, and you have cases and controls? Are you interested in genes that relate to weight in general, or genes whose change with respect to intervention relates to change in weight with respect to intervention (you seem to indicate both in what you wrote but they are two different questions)? Peter

ADD COMMENT • link 10.7 years ago Peter Langfelder ★ 3.0k

Login before adding your answer.