double mmmmmm, thanks for all interesting points of discussion. It is
some
very interesting points and very educational. My heatmaps, don't look
very
informative and judging naively by colour, they don't seem to cluster
that
well with the two methods I tried.
I have a copy of the bioconductor case studies that has a good section
on
distances etc, I will try some of those too. PAM and Kmeans etc.
On Wed, May 11, 2011 at 11:52 PM, john herbert
<arraystruggles@gmail.com>wrote:
> Producing some real heatmaps makes me wonder;
> code;
>
> a = data
> rowv<- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
> colv<- as.dendrogram(hclust(as.dist(1-cor(a))))
> # left map using correlation as a distance
> heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)
>
> x11()
> # map on the right, using the default settings
> heatmap.2(a, scale="row")
>
> The attached heatmaps don't look that great? The first correlation
map
> seems like it does not cluster correctly. A very basic look makes me
think
> the predominantly yellow/white columns should cluster.
>
> the second, the default, looks a little better.
>
> This data is Delta CT values, not array data.
>
> > sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> Kingdom.1252
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] HTqPCR_1.6.0 limma_3.8.1 RColorBrewer_1.0-2
Biobase_2.12.1
> gplots_2.8.0 caTools_1.11
> [7] bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2
>
> loaded via a namespace (and not attached):
> [1] affy_1.30.0 affyio_1.20.0
preprocessCore_1.14.0
> tools_2.13.0
>
>
>
>
> On Wed, May 11, 2011 at 11:27 PM, john herbert
<arraystruggles@gmail.com>wrote:
>
>> mmmm, possibly.
>> Correlation of gene expression = comparing all genes vs. all genes.
>>
>> 1) genes A and B, both highly up regulated = high pos correlation,
e.g.
>> 0.9
>> 2) genes C and D, both lowly down regulated = high pos correlation,
e.g.
>> 0.9
>> 3) gene E high up and gene F low down = high neg correlation, e.g.
-0.9
>> 4) gene G low down and gene H high up = high neg correlation, e.g.
-0.9
>>
>> with 1-cor; genes A, B, C and D cluster together.
>> genes E, F, G and H cluster together but a long way off from ABCD.
>>
>> with 1-abs(cor), all genes cluster together as they produce a
extreme
>> correlation, either pos or neg
>>
>> To me, 1-cor makes more biological sense, is that agreeable?
>>
>> Table.
>> cor 1-cor 1-abs(cor)
>> 1 0 0
>> 0.9 0.1 0.1
>> 0.8 0.2 0.2
>> 0.7 0.3 0.3
>> 0.6 0.4 0.4
>> 0.5 0.5 0.5
>> 0.4 0.6 0.6
>> 0.3 0.7 0.7
>> 0.2 0.8 0.8
>> 0.1 0.9 0.9
>> 0 1 1
>> -0.1 1.1 0.9
>> -0.2 1.2 0.8
>> -0.3 1.3 0.7
>> -0.4 1.4 0.6
>> -0.5 1.5 0.5
>> -0.6 1.6 0.4
>> -0.7 1.7 0.3
>> -0.8 1.8 0.2
>> -0.9 1.9 0.1
>> -1 2 0
>>
>> On Wed, May 11, 2011 at 10:48 PM, James W. MacDonald <
>> jmacdon@med.umich.edu> wrote:
>>
>>>
>>>
>>> On 5/11/2011 4:54 PM, john herbert wrote:
>>>
>>>> A biological circumstance? Interesting. Nothing immediately pops
to
>>>> mind.
>>>>
>>>
>>> You see the pattern, but think about the underlying biology. If
you use
>>> 1-cor, as you note, two genes that are both highly up-regulated or
both
>>> highly down-regulated will cluster together.
>>>
>>> But what if the first gene's product has a negative feedback
effect on
>>> the transcription of the second gene? That implies a relationship
that won't
>>> be captured if you use 1-cor, but will be captured if you use
1-abs(cor).
>>> This is, I believe, what Kevin was getting at.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>> cor 1-cor 1-abs(cor)
>>>> 1 0 0
>>>> 0.9 0.1 0.1
>>>> 0.8 0.2 0.2
>>>> 0.7 0.3 0.3
>>>> 0.6 0.4 0.4
>>>> 0.5 0.5 0.5
>>>> 0.4 0.6 0.6
>>>> 0.3 0.7 0.7
>>>> 0.2 0.8 0.8
>>>> 0.1 0.9 0.9
>>>> 0 1 1
>>>> -0.1 1.1 0.9
>>>> -0.2 1.2 0.8
>>>> -0.3 1.3 0.7
>>>> -0.4 1.4 0.6
>>>> -0.5 1.5 0.5
>>>> -0.6 1.6 0.4
>>>> -0.7 1.7 0.3
>>>> -0.8 1.8 0.2
>>>> -0.9 1.9 0.1
>>>> -1 2 0
>>>>
>>>> looking at the numbers; anything that is highly correlated
,whether
>>>> positively or negatively, will be close in distance. the heatmaps
look
>>>> different and obviously the clustering is different.
>>>>
>>>> Thanks. .
>>>>
>>>> On Wed, May 11, 2011 at 9:19 PM, Kevin R. Coombes<
>>>> kevin.r.coombes@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> This is not really a bioconductor question.... but
>>>>>
>>>>> You need something that behaves like a "distance".
>>>>>
>>>>> By the definition of what you mean by 'distance", two things are
close
>>>>> if
>>>>> and only if the distance is near zero. The bigger (more
positive) the
>>>>> distance, the further apart things are. And you cannot measure
>>>>> distances as
>>>>> negative; only non-negative values need apply.
>>>>>
>>>>> Using 1-cor, you are taking two things to be close if the
correlation
>>>>> is
>>>>> close to 1.
>>>>> And the things that are furthest apart are the ones where the
>>>>> correlation
>>>>> is close to -1.
>>>>>
>>>>> As an exercise, you might want to think about circumstances
where the
>>>>> preferred code would be
>>>>> 1 - abs(cor(*))
>>>>> instead of
>>>>> 1 - cor(*)
>>>>>
>>>>>
>>>>> On 5/11/2011 3:12 PM, john herbert wrote:
>>>>>
>>>>> Dear bioconductors,
>>>>>>
>>>>>>> From a google search, I found the following code that
confuses me a
>>>>>>>
>>>>>> little.
>>>>>> As usual, it is probably something really elementary but
reading
>>>>>> around
>>>>>> does
>>>>>> not solve.
>>>>>>
>>>>>> The code was written by James Mcdonald (
>>>>>>
http://www.mail-archive.com/r-help@r-project.org/msg61514.html)
and
>>>>>> is to
>>>>>> compute dendograms based on correlation and plot the results on
a
>>>>>> heatmap
>>>>>> as
>>>>>> follows;
>>>>>>
>>>>>> a<- matrix(rnorm(50), ncol=5)
>>>>>> rowv<- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
>>>>>> colv<- as.dendrogram(hclust(as.dist(1-cor(a))))
>>>>>> heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)
>>>>>>
>>>>>>
>>>>>> Why the *1*-cor(a)?
>>>>>>
>>>>>>
>>>>>> Orig.cor Adjusted cor
>>>>>> 1 0
>>>>>> 0.9 0.1
>>>>>> 0.8 0.2
>>>>>> 0.7 0.3
>>>>>> 0.6 0.4
>>>>>> 0.5 0.5
>>>>>> 0.4 0.6
>>>>>> 0.3 0.7
>>>>>> 0.2 0.8
>>>>>> 0.1 0.9
>>>>>> 0 1
>>>>>> -0.1 1.1
>>>>>> -0.2 1.2
>>>>>> -0.3 1.3
>>>>>> -0.4 1.4
>>>>>> -0.5 1.5
>>>>>> -0.6 1.6
>>>>>> -0.7 1.7
>>>>>> -0.8 1.8
>>>>>> -0.9 1.9
>>>>>> -1 2
>>>>>>
>>>>>>
>>>>>> This removes negative numbers? What is the reason for doing
this?
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor@r-project.org
>>>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>
>>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor@r-project.org
>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Douglas Lab
>>> University of Michigan
>>> Department of Human Genetics
>>> 5912 Buhl
>>> 1241 E. Catherine St.
>>> Ann Arbor MI 48109-5618
>>> 734-615-7826
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and
should not
>>> be used for urgent or sensitive issues
>>>
>>
>>
>
[[alternative HTML version deleted]]