Entering edit mode
Nathan Harmston
▴
100
@nathan-harmston-2904
Last seen 10.4 years ago
Hi,
I m currently trying to run some clustering on some expression arrays
and I
was wondering about the best way of doing it, I have 81 samples on
hgu133plus2 (55000), I have filtered this down to approximately 10000
(X, Y,
low variabilty, control probes), and wanted to try hierarchical
clustering
on these both by arrays and genes. I was planning on using hopach as
this
seems an easy and obvious choice. How long would such a lot of
comparisons
take? I make it something like ( 81 * 10000 ) ^ 2 comparisons, I have
a
machine with 24gb of memory. Has anybody ever done something like this
before? and what was the amount of time it took to actually do it?
Given it
might take a while are there any suggestions for how I might decrease
the
running time for such a program? I am already creating the distance
matrix
prior to clustering.
Why is it better to use cosangle for gene clustering and euclidean
distance
for arrays? Is there a good reason for this and why would you use one
distance over another.
Many thanks in advance,
Nathan
[[alternative HTML version deleted]]