Heatmap with 7120x500 array
2
0
Entering edit mode
Gaston Fiore ▴ 40
@gaston-fiore-4224
Last seen 10.2 years ago
Hello everyone, I'm trying to produce a heat map that clusters 7120 genes into 6 groups based on 500 conditions. I'm using kmeans and then image, but I've two problems. The first one is that kmeans sometimes doesn't converge even with 10 restarts, and the second one is that the image produced is basically all read (I'm using the standard color scheme), not to mention it's size is massive and very hard to deal with. Does anyone have any suggestions on how I could accomplish this task efficiently, or is this data just too big to cluster? Thanks a lot, -Gaston
• 1.2k views
ADD COMMENT
0
Entering edit mode
@gerhard-thallinger-1552
Last seen 5 weeks ago
Austria
Hi Gaston, > I'm trying to produce a heat map that clusters 7120 genes > into 6 groups based on 500 conditions. I'm using kmeans and > then image, but I've two problems. The first one is that > kmeans sometimes doesn't converge even with 10 restarts, and > the second one is that the image produced is basically all > read (I'm using the standard color scheme), not to mention > it's size is massive and very hard to deal with. Does anyone > have any suggestions on how I could accomplish this task > efficiently, or is this data just too big to cluster? Genesis should be able to handle datasets that large (http://genome.tugraz.at/genesisclient/genesisclient_description.shtm l) Adapting the color scale is very easy. I can't comment on the convergence of k-means, this could depend on the data. Regards, Gerhard
ADD COMMENT
0
Entering edit mode
On 08/28/2010 10:52 AM, Gerhard Thallinger wrote: > Hi Gaston, > >> I'm trying to produce a heat map that clusters 7120 genes >> into 6 groups based on 500 conditions. I'm using kmeans and >> then image, but I've two problems. The first one is that >> kmeans sometimes doesn't converge even with 10 restarts, and >> the second one is that the image produced is basically all >> read (I'm using the standard color scheme), not to mention >> it's size is massive and very hard to deal with. Does anyone >> have any suggestions on how I could accomplish this task >> efficiently, or is this data just too big to cluster? > > Genesis should be able to handle datasets that large > (http://genome.tugraz.at/genesisclient/genesisclient_description.sh tml) > Adapting the color scale is very easy. > > I can't comment on the convergence of k-means, this could depend > on the data. Hi Gaston I'd guess the 'all read' (? red) is due to a few extreme values driving the color palette -- perhaps you intend to log-transform or otherwise pre-process the data before clustering / display, which might also help convergence? Likewise applying a filter like varFilter in the genefilter package to reduce the number of genes being clustered -- most will not be contributing anything meaningful to the clustering algorithm. I think what you want to do is to separate the steps of clustering, reordering rows / columns, and displaying the image. See ?dendrogram, ?reorder, ?heatmap. Heatmpap should be doing little more than plotting an image (no sense in printing the dendrograms, as they'll be too dense to make sense of). Martin > > Regards, > > Gerhard > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 10.2 years ago
Think you need to filter those genes as you almost certainly have too much noise . Suggest you filter the genes: if you are doing class discovery perhaps use the ratio of mean/sd - choose ones which the most variation and used about 500-1000 ish . The heatmap won't be readable in any case ....I would perhaps try principle components/ spectral decomposition like: the.pca <- prcomp(data,scale = TRUE) # for samples/genes (try using attributes(the.pca ) dim(the.pca$x) ### estimate PCA's you need the.pca.var <- round(the.pca$sdev^2 / sum(the.pca$sdev^2)*100,2) plot(c(1:length(the.pca.var)),the.pca.var,type="b",xlab="# components",ylab="% variance",main="Scree Plot for Hits",col="red",cex=1.5,cex.lab=1.5) savePlot("scree plot.jpeg",type="jpeg") centers<-15 the.cl<-kmeans(the.pca$x[,1:2],centers=centers,iter.max=1000) #Do kmeans colours <- rainbow(centers) ##2D plot(range(the.pca$x[,1]),range(the.pca $x[,2]),xlab="PCA1",ylab="PCA2",main="Spectral clustering of differential hits") text(the.pca$x[,1],the.pca$x[,2],label=rownames(the.pca $x),col=colours[the.cl$cluster],cex=0.75) library(scatterplot3d) ### 3D s3d<-scatterplot3d(range(the.pca$x[,1]),range(the.pca $x[,2]),range(the.pca $x[,3]),xlab="PCA1",ylab="PCA2",zlab="PCA3",main="Spectral clustering of differential hits",angle=120) text(s3d$xyz.convert(the.pca$x[,1],the.pca$x[,2],the.pca $x[,3]),label=rownames(the.pca$x),col=colours[the.cl$cluster],cex=0.75 ) points(s3d$xyz.convert(the.pca$x[wanted,1],the.pca$x[wanted,2],the.pca $x[wanted,3]),col=color,cex=5.0) Otherwise if you have class labels SAM or PAM. Hope that helps Cheers Paul -----Original Message----- From: Gaston Fiore <gaston.fiore@gmail.com> To: bioconductor@stat.math.ethz.ch Subject: [BioC] Heatmap with 7120x500 array Date: Fri, 27 Aug 2010 15:39:37 -0400 Hello everyone, I'm trying to produce a heat map that clusters 7120 genes into 6 groups based on 500 conditions. I'm using kmeans and then image, but I've two problems. The first one is that kmeans sometimes doesn't converge even with 10 restarts, and the second one is that the image produced is basically all read (I'm using the standard color scheme), not to mention it's size is massive and very hard to deal with. Does anyone have any suggestions on how I could accomplish this task efficiently, or is this data just too big to cluster? Thanks a lot, -Gaston _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6