Hello All,
I am using the hclust function .
As the data size is huge(2000 genes) I tried to use cuttree to cut the
original tree into subtrees and then analyze the data.
I cant figure out how to view a subtree with the labels.
Can anyone help me ASAP?
Thanks in advance,
Madhurima.
On 11/9/05 2:42 AM, "madhurima bhattacharjee" <madhurima_b at="" persistent.co.in="">
wrote:
> Hello All,
>
> I am using the hclust function .
> As the data size is huge(2000 genes) I tried to use cuttree to cut
the
> original tree into subtrees and then analyze the data.
> I cant figure out how to view a subtree with the labels.
> Can anyone help me ASAP?
As another poster mentioned, it is REALLY worthwhile to try exporting
to
Cluster/Treeview for jobs like this. However, cutree gives you the
cluster
memberships for your genes, so you can then pull out only the genes of
interest by subsetting.
> m <- matrix(rnorm(500),nc=10)
> hclust(m)
Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
argument is of length zero
> hc <- hclust(dist(m))
> plot(hc)
> v <- cutree(hc,k=3)
> v
[1] 1 2 3 1 3 3 3 1 3 1 3 1 1 2 3 2 1 3 3 2 1 3 1 3 2 2 3 1 3 3 3 3 2
1 1 2
3 1
[39] 1 3 2 3 2 2 1 3 2 3 2 3
> plot(hclust(dist(m[v==1,])))
Hi Mahurim
I am apparently not able to submit to bioconductor (I thought I was
registered but my email yesterday about cutree was not posted and I
haven't had time to straighten it out).
Why do you need to display the tree?
Often the tree is used to identify clusters and the internal
hierarchial relationships are not that important. However if you do
need
to view the substructure then thhe option of the external program
sounds
like it might serve your needs.
If you need to keep it in R then the following algorithm would work:
1. Use cutree to identify the genes in the subtree you want to examne
2. Select the pairwise distances from the distance matrix used to fit
the
full tree for these gene pairs and generate a dist object -- basically
turn the dist object for all the genes into a full matrix (there is a
dist2full function available for that), select the rows/columns for
the
genes of interest, convert this submatrix back to a dist object, and
plclust that dist object.
But let me know why you need to do this and perhaps we can come up
with a
alternaative to viwing the large tree.
Bill
---
Biostatistics Consulting Center
http://ilya.wustl.edu/~shannon/bcc_announcement.pdf
"Statistics is not a discipline like physics, chemistry or biology
where we study a subject to solve problems in the same subject. We
study statistics with the main aim of solving problems in other
disciplines." CR Rao
William D. Shannon, Ph.D.
Associate Professor of Biostatistics in Medicine
Division of General Medical Sciences and Biostatistics
Washington University School of Medicine
Campus Box 8005, 660 S. Euclid
St. Louis, MO 63110
Phone: 314-454-8356
Fax: 314-454-5113
e-mail: wshannon at wustl.edu
web page: http://ilya.wustl.edu/~shannon