Entering edit mode
Donna Toleno
▴
90
@donna-toleno-2383
Last seen 10.2 years ago
Hello list.
When I make an R Cluster Dendrogram, it looks very different from the
clustering in the Newick file displayed in Treeview (Rod Page program)
. I tried a simple example with 12 probes and 3 samples and I did the
Euclidean distances manually and with R.
> library(ctc)
> data
V1 V2 V3
1 4.184499 4.142575 4.017366
2 3.459849 3.455023 3.732115
3 8.287278 4.887692 5.007794
4 4.137224 4.523774 4.191996
5 4.431768 4.356945 4.570331
6 3.867442 3.931225 3.967566
7 3.480681 3.609997 3.522618
8 3.460785 3.966638 3.708675
9 4.306729 4.480724 4.399165
10 4.290001 4.036634 4.078688
11 6.707544 7.179901 9.475103
12 6.837264 6.845438 7.364477
> hc <- hcluster(t(data), link = "ave")
> write(hc2Newick(hc),file='hclust_12_probes_newick')
> plot (hc)
> hc
Call:
hcluster(x = t(data), link = "ave")
Cluster method : average
Distance : euclidean
Number of objects: 3
'hclust_12_probes_newick' file contains:
(V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.7523
46233726435);
I can see that the above Newick formatted tree shows that sample 2 and
sample 3 are the appropriate distance apart, about 2.4, but where does
the 0.7523... come from? How do I interpret "Height" on the y-axis of
this dendrogram? I would like a tree that represents the expression
difference. The Newick tree viewed in TreeView (Rod Page's Treeview)
looks different from the dendrogram produced by hcluster, but the
branch lengths still do not reflect the Euclidean distances. In my
example, the Newick tree shows all three samples about equidistant
from each other. Perhaps I should be using phylogenetic tree drawing
to get the appropriate branch lengths from the Euclidean distances? I
also experimented with hclust2treeview but this seems to refer to
Michael Eisen's Treeview. I am not familiar with this program or the
file formats it uses.
Thank you for reading. Any comments will be appreciated.
Euclidean distance manually calculated in Excel for all of the 12
probes:
V2 V3
V1 3.508320996 4.352360295
V2 2.425648178
> distances.12.probes <- as.matrix(dist(t(data), method = "euclidean",
diag = FALSE))
> distances.12.probes
V1 V2 V3
V1 0.000000 3.508321 4.352360
V2 3.508321 0.000000 2.425648
V3 4.352360 2.425648 0.000000
Thank you again.
-Donna