Hi,
the correlations are different because the two functions, which are
just visualization functions, visualize different relationships. In
the heatmap you see correlations of module eigengenes with the trait,
while the verboseScatterplot probably shows the scatterplot of
individual gene signifcances vs. their module membership. Hence the
number of observations in the heatmap is your number of samples,
whereas in the scatterplot the number of observation is the number of
genes in your module, which is typically larger.
HTH,
Peter
On Fri, Aug 8, 2014 at 4:40 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote:
> Hello!
>
> I have used the WGCNA package and found 2 modules which are of
interest for
> me. I picked out from the labeledHeatmap (which related the module
eigengene
> with clinical traits). I then selected the modules (one at a time)
and ran
> the verboseScatterplot function.
>
> For one module its correlation was 0.51, p = 0.01 in labeledHeatmap,
but
> 0.27, p<0.0001 in the verboseScatterplot.
> How can this be?
>
>
>
> Thank you.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Thank you! I understand now, its actually well described on the
webpage
and in the documentation, I should have used more time reading before
posting.
If you have the time, I actually have a few more questions:
1. Why is the module size set at min. 30? What will the implications
be
if larger/smaller?
2. Is it always better to use the step-by-step network construction
and
module detection? Let me give you some details of my design so you
understand why I ask. I have two time points, from the same persons,
before and after an intervention. My immediate idea was to run WGCNA
on
the time point 1, relating it to body weight. Then I run WGCNA on time
point 2, seeing if the same modules pops up. Finally, I run WGCNA on
the
log-ratio ( time point 2- time point 1 ) against log-ratio of body
weight. If one or more modules shows up in all three runs, they will
be
prioritised. Now, if I only use the one-step approach, no modules show
up in all three comparisons, while using step-by-step looks more
promising. However, if I mix the results, using one-step on time point
1, but step-by-step on time point 2, its even more interesting.
Thank you very much!
On 2014-08-09 18:30, Peter Langfelder wrote:
> Hi,
>
> the correlations are different because the two functions, which are
> just visualization functions, visualize different relationships. In
> the heatmap you see correlations of module eigengenes with the
trait,
> while the verboseScatterplot probably shows the scatterplot of
> individual gene signifcances vs. their module membership. Hence the
> number of observations in the heatmap is your number of samples,
> whereas in the scatterplot the number of observation is the number
of
> genes in your module, which is typically larger.
>
> HTH,
>
> Peter
>
> On Fri, Aug 8, 2014 at 4:40 PM, Sindre Lee <sindre.lee at="" medisin.uio.no="">
> wrote:
>> Hello!
>>
>> I have used the WGCNA package and found 2 modules which are of
>> interest for
>> me. I picked out from the labeledHeatmap (which related the module
>> eigengene
>> with clinical traits). I then selected the modules (one at a time)
>> and ran
>> the verboseScatterplot function.
>>
>> For one module its correlation was 0.51, p = 0.01 in
labeledHeatmap,
>> but
>> 0.27, p<0.0001 in the verboseScatterplot.
>> How can this be?
>>
>>
>>
>> Thank you.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Best regards
Sindre Lee
Medical Research Student
Department of Nutrition, Institute of Basic Medical Sciences, Faculty
of Medicine, University of Oslo
POB 1046, Blindern, 0317 Oslo, Norway
Visiting address: Sognsvannsveien 9, Domus Medica
Mobile phone: +47 46796851
E-mail: sindre.lee at medisin.uio.no;
Web page: http://www.med.uio.no/imb/personer/vit/sindrle/index.html
Hi Sindre,
please see inline.
On Sat, Aug 9, 2014 at 9:47 AM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote:
> Thank you! I understand now, its actually well described on the
webpage and
> in the documentation, I should have used more time reading before
posting.
>
> If you have the time, I actually have a few more questions:
>
> 1. Why is the module size set at min. 30? What will the implications
be if
> larger/smaller?
I feel that 30 is a good compromise between robust large modules and
possibly informative but not as robust small modules. You can
certainly adjust the minimum size, although I rarely find modules
below 30 genes interesting. The implication is very simple - smaller
minimum module size will lead to more modules, but whether this
provides more biological information or just more noise is highly
situation-dependent.
>
> 2. Is it always better to use the step-by-step network construction
and
> module detection?
The step-by-step gives you more options to tweak the procedure or
insert your custom code between the steps. Otherwise they are (nearly)
identical. I say nearly because the blockwise... functions include an
extra step of removing peripheral genes from modules.
> Let me give you some details of my design so you
> understand why I ask. I have two time points, from the same persons,
before
> and after an intervention. My immediate idea was to run WGCNA on the
time
> point 1, relating it to body weight. Then I run WGCNA on time point
2,
> seeing if the same modules pops up. Finally, I run WGCNA on the log-
ratio (
> time point 2- time point 1 ) against log-ratio of body weight. If
one or
> more modules shows up in all three runs, they will be prioritised.
Now, if I
> only use the one-step approach, no modules show up in all three
comparisons,
> while using step-by-step looks more promising. However, if I mix the
> results, using one-step on time point 1, but step-by-step on time
point 2,
> its even more interesting.
If you want to see modules that pop up in all 3 data sets, use the
consensus module approach. But it's not clear to me that looking for
modules in data set 1, data set 2, and the log-ratio of the two makes
sense. You could get consensus modules across sets 1 and 2, and then
see if the log-ratio of interesting modules is still associated with
the log-ratio of body weight.
Peter
Thank you for your answers!
Please see inline for a final comment.
On 2014-08-11 19:47, Peter Langfelder wrote:
> Hi Sindre,
>
> please see inline.
>
> On Sat, Aug 9, 2014 at 9:47 AM, Sindre Lee <sindre.lee at="" medisin.uio.no="">
> wrote:
>> Thank you! I understand now, its actually well described on the
>> webpage and
>> in the documentation, I should have used more time reading before
>> posting.
>>
>> If you have the time, I actually have a few more questions:
>>
>> 1. Why is the module size set at min. 30? What will the
implications
>> be if
>> larger/smaller?
>
> I feel that 30 is a good compromise between robust large modules and
> possibly informative but not as robust small modules. You can
> certainly adjust the minimum size, although I rarely find modules
> below 30 genes interesting. The implication is very simple - smaller
> minimum module size will lead to more modules, but whether this
> provides more biological information or just more noise is highly
> situation-dependent.
>
>>
>> 2. Is it always better to use the step-by-step network construction
>> and
>> module detection?
>
> The step-by-step gives you more options to tweak the procedure or
> insert your custom code between the steps. Otherwise they are
(nearly)
> identical. I say nearly because the blockwise... functions include
an
> extra step of removing peripheral genes from modules.
>
>
>
>> Let me give you some details of my design so you
>> understand why I ask. I have two time points, from the same
persons,
>> before
>> and after an intervention. My immediate idea was to run WGCNA on
the
>> time
>> point 1, relating it to body weight. Then I run WGCNA on time point
>> 2,
>> seeing if the same modules pops up. Finally, I run WGCNA on the
>> log-ratio (
>> time point 2- time point 1 ) against log-ratio of body weight. If
one
>> or
>> more modules shows up in all three runs, they will be prioritised.
>> Now, if I
>> only use the one-step approach, no modules show up in all three
>> comparisons,
>> while using step-by-step looks more promising. However, if I mix
the
>> results, using one-step on time point 1, but step-by-step on time
>> point 2,
>> its even more interesting.
>
>
> If you want to see modules that pop up in all 3 data sets, use the
> consensus module approach. But it's not clear to me that looking for
> modules in data set 1, data set 2, and the log-ratio of the two
makes
> sense. You could get consensus modules across sets 1 and 2, and then
> see if the log-ratio of interesting modules is still associated with
> the log-ratio of body weight.
Ok, I will try that, thank you!
My thought was that if the module was related weight (in two data
sets)
and changes (log-ratio) in weight, wouldn't that point to a stronger
relationship? If not, why?
On Mon, Aug 11, 2014 at 1:26 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote:
> My thought was that if the module was related weight (in two data
sets) and
> changes (log-ratio) in weight, wouldn't that point to a stronger
> relationship? If not, why?
You can think about the log-ratio as conditioning out the denominator
(e.g., time 1 if you do log(time 2/time 1). If you had a strong signal
at time 1, you will take the signal out; if the signal at time 2 was
similar to that at time 1 and you take the time-1 signal out, you're
left with no signal (relating to weight). For gene expressions you're
not only taking out their relationship to weight, you also remove
their correlation at time 1 - if the correlations at time 2 were
similar, you will again be left with data whose correlation structure
is very different from the original, so you most likely won't observe
the same modules again.
Peter
On 2014-08-11 23:00, Peter Langfelder wrote:
> On Mon, Aug 11, 2014 at 1:26 PM, Sindre Lee
> <sindre.lee at="" medisin.uio.no=""> wrote:
>
>> My thought was that if the module was related weight (in two data
>> sets) and
>> changes (log-ratio) in weight, wouldn't that point to a stronger
>> relationship? If not, why?
>
> You can think about the log-ratio as conditioning out the
denominator
> (e.g., time 1 if you do log(time 2/time 1). If you had a strong
signal
> at time 1, you will take the signal out; if the signal at time 2 was
> similar to that at time 1 and you take the time-1 signal out, you're
> left with no signal (relating to weight). For gene expressions
you're
> not only taking out their relationship to weight, you also remove
> their correlation at time 1 - if the correlations at time 2 were
> similar, you will again be left with data whose correlation
structure
> is very different from the original, so you most likely won't
observe
> the same modules again.
>
> Peter
Thank you for a nice explanation!
I have to re-think what my approach should be then. I want to find
weight-related genes and then find out how my intervention effects
weight through these genes.
So if I want to answer the question "Which genes are changed during
intervention and associated the change in weight?", what would you
suggest as an approach?
Thank you again, this has been very enlightening for me.
On Mon, Aug 11, 2014 at 3:11 PM, Sindre Lee <sindre.lee at="" medisin.uio.no=""> wrote:
>
> I have to re-think what my approach should be then. I want to find
> weight-related genes and then find out how my intervention effects
weight
> through these genes.
You may want to speak to a local statistician who can offer better
advice than me over email.
>
> So if I want to answer the question "Which genes are changed during
> intervention and associated the change in weight?", what would you
suggest
> as an approach?
I don't know what your experimental design is, what are the controls -
is it time 1, or is time 1 baseline, time 2 after treatment, and you
have cases and controls? Are you interested in genes that relate to
weight in general, or genes whose change with respect to intervention
relates to change in weight with respect to intervention (you seem to
indicate both in what you wrote but they are two different questions)?
Peter