> could someone please help with Diana clustering and visualisation.
>
> I would like to do 1-way (genes only) and 2-way (genes and samples)
> clustering and visualise as a heatmap or in Treeview software.
>
>
> Anthony
> --
I've never used Treeview but I have used heatmap and diana. The way
I've
used diana is to first save the diana object, convert to dendrogram,
and
define clusters by cutting it a certain height. I've used the diana
algorithm both with and without the dissimilarity matrix. I've copied
some
code I have and modified the names of objects to hopefully be a bit
clearer.
## on raw data (matrix of Mvalues, rows = genes, col=arrays)
Mvalues <- matrix(0,nrow=100,ncol=9)
rownames(Mvalues) <- 300:400
colnames(Mvalues) <- ("a","b","c","d","e","f","g","h","i")
for (i in 1:3) Mvalues[,i] <- rnorm(100)
for (i in 4:6) Mvalues[,i] <- rnorm(100,mean=2,sd=0.5)
for (i in 7:9) Mvalues[,i] <- rnorm(100,mean=-1,sd=0.7)
dianaGenes <- diana(Mvalues)
## or using a precomputed dissilarity matrix:
## dianaGenes <- diana(dissMatrix,diss=TRUE,keep.diss=FALSE)
dianaDend <- as.dendrogram(as.hclust(dianaGenes))
dianaDendOrder <- order.dendrogram(dianaDend)
## My rownames is index.name. I reorder it based on the new order
clusteredGeneNames <- rownames(Mvalues)[dianaDendOrder]
## To select the colours use
low <- col2rgb("green")/255
high <- col2rgb("red")/255
heatmapCol <- rgb( seq(low[1],high[1],len=123),
seq(low[2],high[2],len=123),
seq(low[3],high[3],len=123) )
## personally I don't much like the red/green system, and prefer
heat.colors
heatmapCol <- heat.colors(123)
## If you are just clustering on genes you can colour the arrays
## eg say you had 3 groups of 3
colColours <- c(rep("green",3),rep("red",3),rep("blue",3))
##you can also define clusters by cutting the dendrogram and colouring
these:
dianaClusters.h2 <- cut(dianaDend,h=2)
nClusters <- length(dianaClusters.h2$lower)
dianaClusters <- numeric(length=dim(Mvalues)[1])
for (i in 1:nClusters)
dianaClusters[order.dendrogram(dianaClusters.h2$lower[[i]])] <- i
## now colour the rows based on clusters.
## I like distinct colours between clusters
rowColChoices <- character(nClusters)
nClusters.2 <- ceiling(nClusters/2)
nClusters.2.min <- min(nClusters.2,floor(nClusters/2))
rowColChoices[1:nClusters.2*2-1] <-
rainbow(nClusters.2,start=0,end=2/6)
rowColChoices[1:nClusters.2.min*2] <-
rev(rainbow(nClusters.2.min,start=3/6,end=5/6))
rowCols <- character(dim(Mvalues)[1])
for (i in 1:length(rowCols)) rowCols[i] <-
rowColChoices[dianaClusters[i]]
## or randomly assign colours
rowColChoices <- rainbow(nClusters)[sample(nClusters,nClusters)]
rowCols <- character(length=dim(MValues)[1])
for (i in 1:length(rowCols))
rowCols[i] <- rowColChoices[dianaClusters[i]]
## To cluster just on genes:
heatmap(Mvalues, Rowv=dianaDend, Colv=NA, scale="row",
labRow=clusteredGeneNames,cexRow=.2,
col=heatmapCol,
ColSideColors=colColours,RowSideColors=rowCols)
to cluster on genes and arrays I think just replace Colv
with a dendrogram object based on clustering over cols:
dianaArrays <- diana(t(Mvalues))
dianaDendArrays <- as.dendrogram(as.hclust(dianaArrays))
# call heatmap with Colv=dianaDendArrays and drop ColSideColours
heatmap(Mvalues, Rowv=dianaDend, Colv=dianaDendArrays, scale="row",
labRow=clusteredGeneNames,cexRow=.2,
col=heatmapCol,
RowSideColors=rowCols)
Cheers
Chris
Dr Chris Wilkinson
Senior Research Officer (Bioinformatics) | ARC Research Associate
Child Health Research Institute (CHRI) | Microarray Analysis Group
7th floor, Clarence Rieger Building | Room 121
Women's and Children's Hospital | School of Mathematical
Sciences
72 King William Rd, North Adelaide, 5006 | The University of Adelaide,
5005
Math's Office (Room 121) Ph: 8303 3714
CHRI Office (CR2 52A) Ph: 8161 6363
Christopher.Wilkinson@adelaide.edu.au
http://mag.maths.adelaide.edu.au/crwilkinson.html