Entering edit mode
Timothy Hughes
▴
20
@timothy-hughes-4553
Last seen 10.5 years ago
We wish to perform clustering on expression data and therefore are
interested in the variance-stabilizing transformation of DESeq. I
understand
what the purpose of the transformation is namely to produce values
whose
variances are approximately the same, but why is it necessary to do
this
when computing the distance between two values? Or put another way, in
what
way does hierarchical clustering make assumptions about similar
variances?
I believe I have the answer, but it would be nice if someone could
confirm
this.
When doing clustering one is often effectively trying to minimize the
variance within a cluster even if this is not explicitly defined. If
we
consider that the observations being clustered are random variables
with a
variance then we should explicitly account for this variance and use a
variance stabilising transformation. This avoids the need for trying
to
account for the variance in the clustering process.
The intuition would be that given 3 observation:
A (high var)----------------B---------
--------------------------------C (low var)
One may choose to cluster B and C if C's variance is very much lower
than
A's eventhough the observed distance between B and C is greater than
the
distance between B and A.
Any help much appreciated.
--
Tim Hughes PhD (http://digitised.info)
Medical Genetics Department
Oslo University Hospital Ullevål
Kirkeveien 166
0407 Oslo
Norway
[[alternative HTML version deleted]]