perhaps because they don't add anything beyond the simple and broadly
understood method??
-----Original Message-----
From: cstrato
To: Ramon Diaz-Uriarte
Cc: Prof Brian Ripley; James W. MacDonald;
bioconductor@stat.math.ethz.ch
Sent: 13/09/04 20:26
Subject: Re: [BioC] Re: [S] Error in clustering procedure
Another issue which I do not understand is: Why do all
people use the same hierarchical clustering method and
none of the many new clustering methods which exist.
To mention a few examples in each clustering category:
Partitioning methods: CLARA or CLARANS
Hierarchical methods: BIRCH or CURE
Density-based methods: DBSCAN, OPTICS or DENCLUE
Grid-based methods: STING, WaveCluster or CLIQUE
Model-based methods: COBWEB or CLASSIT
It would be great to be able to try these novel methods
and to know, which method would be especially suitable
for which purpose.
Best regards
Christian
Ramon Diaz-Uriarte wrote:
> On Monday 13 September 2004 10:36, michael watson (IAH-C) wrote:
>
>>I guess I'm coming to this late, but I'm pretty sure all biologists
use
>>cluster analysis for is for finding out which genes are behaving
similarly
>>to one another in a large data set. Then if, for example, all genes
from a
>
>
> Oh, but that is one problem I was referring to: say you use UPGMA;
then, you
> will get a dendrogram; then, you can make up any story. That was one
of my
> concerns. Clustering gives you clusters, but most papers I've seen
that "use"
> clustering do not seem to be overly concerned about how meaningful
and
> repeatable those clusters are.
>
> Related to the above, and to clustering being over-sold, is the fact
that very
> rarely does one find discussion in those papers about how the type
of
> clustering algorithm affects the results, and how different
clustering
> algorihms/different metrics, etc, can relate to the prior beliefs
about the
> shape of clusters (or how different clustering algorithms are better
to
> detect different patterns).
>
> And finally, it is not always clear that the difference between
exploratory
> and confirmatory is being made. We can read senteces such as "the
clustering
> results show that there are two groups"... Well, in what sense and
how
do the
> results from some aglomerative clustering algorithm show that there
are two
> groups (and not twenty)?
>
> But, again, I do think clustering has a role for certain types of
questions. I
> just think it is not the magic bullet to "let the data speak for
themselves",
> and similar marketing hype.
>
> Best,
>
> R.
>
>
>>certain pathway are showing a similar expression pattern, we have a
>>hypothesis which can be tested further.
>>
>>If cluster analysis has indeed been "over-sold", please suggest a
better
>>algorithm for summarising groups of genes that are behaving
similarly
>>across a group of experiments or time-points :-)
>>
>>M
>>
>>-----Original Message-----
>>From: Ramon Diaz-Uriarte [mailto:rdiaz@cnio.es]
>>Sent: 08 September 2004 09:33
>>To: bioconductor@stat.math.ethz.ch
>>Cc: Prof Brian Ripley; cstrato; James W. MacDonald
>>Subject: Re: [BioC] Re: [S] Error in clustering procedure
>>
>>On Tuesday 07 September 2004 21:17, cstrato wrote:
>>
>>>Dear all
>>>
>>>First of all, I want to apologize to Prof. Ripley, since I forgot
to
>>>ask him first for permission to publish his comment.
>>>
>>>Personally, I agree that this would be useless, as Prof. Ripley has
>>>already told me some years ago. However, almost everybody still
seems
>>>to do it and publish the corresponding results. Companies such as
>>>Spotfire are proud that you can do hierarchical clustering with
more
>>>than 20,000 genes. I have never seen a publication where it was
done
>>>differently.
>>
>>Part of this could be the result of imitative behavior, beliefs that
>>"unless I put a neat heatmap I won't get it past reviewers", etc,
but
not
>>evidence that it is the best way to go. If several companies are
making an
>>issue out of it in their brochures, maybe it is because customers
ask
for
>>clustering. As for "publish the corresponding results" I am not
sure
what
>>the "results" are, since after clustering 7000 genes you can almost
always
>>make up a story after the fact; but I would not call that a result.
>>
>>I think clustering (and biclustering) do have a place, but I guess
they
>>should be used as a tool to answer some question (e.g., I think I
>>understand what question a t-test is helping to answer; I am not
sure
about
>>most clustering procedures), or as a guidance for something, not as
some
>>kind of magic tool to "let the data speak for themselves" ( = a) get
the
>>microarray data; b) run a clustering procedure; c) come up with a
question
>>that your cluster "answered".)
>>
>>R.
>>
>>
>>>I think that the bioconductor list would be the best forum to
discuss
>>>this issue, and provide solutions (besides the obvious suggestion
to
>>>filter non-varying genes).
>>>
>>>Best regards
>>>Christian
>>>
>>>James W. MacDonald wrote:
>>>
>>>>cstrato wrote:
>>>>
>>>>>Sorry, but I cannot resist:
>>>>>
>>>>>Any comments of the microarry community on the usefulness of
>>>>>hierarchical clustering of 7000 genes?
>>>>
>>>>I think this would be almost completely useless. First off,
>>>>clustering is not an inferential technique, so its use has been
>>>>completely oversold IMO to the biological community. Secondly,
>>>>clustering is usually done to produce a 'heat map' to put in a
paper
>>>>or flash on the screen during a presentation. How on earth would
>>>>this be of any use? You couldn't even read any of the gene names!
>>>>
>>>>Of course you could use the heatmap to impress friends and
>>>>colleagues with the fact that you rate a computer powerful enough
to
>>>>*do* a heatmap with a 7000 x 5 matrix ;-D
>>>>
>>>>Jim
>>>>
>>>>
>>>>>Best regards
>>>>>Christian
>>>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>>>C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
>>>>>V.i.e.n.n.a. .A.u.s.t.r.i.a
>>>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor@stat.math.ethz.ch
>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
**********************************************************************
This email and any files transmitted with it are
confidentia...{{dropped}}