HOPACH clustering of genes
1
0
Entering edit mode
@nathan-harmston-2904
Last seen 10.4 years ago
Hi, I m currently trying to run some clustering on some expression arrays and I was wondering about the best way of doing it, I have 81 samples on hgu133plus2 (55000), I have filtered this down to approximately 10000 (X, Y, low variabilty, control probes), and wanted to try hierarchical clustering on these both by arrays and genes. I was planning on using hopach as this seems an easy and obvious choice. How long would such a lot of comparisons take? I make it something like ( 81 * 10000 ) ^ 2 comparisons, I have a machine with 24gb of memory. Has anybody ever done something like this before? and what was the amount of time it took to actually do it? Given it might take a while are there any suggestions for how I might decrease the running time for such a program? I am already creating the distance matrix prior to clustering. Why is it better to use cosangle for gene clustering and euclidean distance for arrays? Is there a good reason for this and why would you use one distance over another. Many thanks in advance, Nathan [[alternative HTML version deleted]]
Clustering hopach Clustering hopach • 1.2k views
ADD COMMENT
0
Entering edit mode
@shannon-william-2930
Last seen 10.4 years ago
You may want to look at kmeans clustering instead of hierarchical if you are interesed in genes with correlated expression patterns across the samples. Imposing a hierarchical structure/model on 10,000 genes is probably incorrect -- gene A and B may be correlated but independent in terms of function, evolutionary history, pathway etc. In terms of how long it takes you would have to calculate a 10000*(9999)/2 = 49,995,000 element distance matrix -- my best suggestion is start the distance calculation and see if it gets finished in a reasonable amount of time. Bill Shannon, PhD Associate Professor of Biostatistics in Medicine Washington University in St Louis President-elect, Classificatin Society ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Nathan Harmston [iwanttobeabadger@googlemail.com] Sent: Monday, July 21, 2008 8:55 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] HOPACH clustering of genes Hi, I m currently trying to run some clustering on some expression arrays and I was wondering about the best way of doing it, I have 81 samples on hgu133plus2 (55000), I have filtered this down to approximately 10000 (X, Y, low variabilty, control probes), and wanted to try hierarchical clustering on these both by arrays and genes. I was planning on using hopach as this seems an easy and obvious choice. How long would such a lot of comparisons take? I make it something like ( 81 * 10000 ) ^ 2 comparisons, I have a machine with 24gb of memory. Has anybody ever done something like this before? and what was the amount of time it took to actually do it? Given it might take a while are there any suggestions for how I might decrease the running time for such a program? I am already creating the distance matrix prior to clustering. Why is it better to use cosangle for gene clustering and euclidean distance for arrays? Is there a good reason for this and why would you use one distance over another. Many thanks in advance, Nathan [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6