Entering edit mode
Heike Pospisil
▴
310
@heike-pospisil-1097
Last seen 10.2 years ago
Hello,
I have a question concerning hierarchical clustering and the effect of
group sizes.
I would like to select genes that are differentially expressed between
group A
and group B. Afterwards, I wish to cluster the samples by these genes.
In
principle, it works fine, but I have a problem if the group sizes are
significantly unequal. One example is as e.g.:
group A: 53 samples
group B: 12 samples
The resulting clustering brings group B together, but it is not
clearly
separated from group A. Then again, if I take 12 samples from group A
randomly
(to get equal group sizes), the clustering is nearly perfect.
I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete")
(ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and
tried other
distance measures, but without improvement.
Does anybody have a hint which clustering algorithm should be prefered
for such
unequal group sizes?
Thanks in advance and best wishes,
Heike
--
Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de
University of Hamburg | Center for Bioinformatics
Bundesstrasse 43 | 20146 Hamburg, Germany
phone:+49-40-42838-7303 | fax: +49-40-42838-7312