On 3/13/06 21:08, "mark salsburg" <mark.salsburg at="" gmail.com=""> wrote:
> I am having trouble getting the function heatmap() to work on the
following
> gene expression
>
>> dim(SAMPLES_log)
> [1] 12626 20
>
>
> sample1 sample2...................sample20
> gen1
> gen2
> gen3
> ....
> gen12626
>
>
>
> I have converted SAMPLES_log to a numeric matrix using:
>
> as.matrix(SAMPLES_log)
>
> when I use the following command:
>
> heatmap(SAMPLES_log)
>
> Error: cannot allocate vector of size 622668 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 1022Mb: see help(memory.size)
> 2: Reached total allocation of 1022Mb: see help(memory.size)
Mark,
In order to do a heatmap on 12000 genes, a triangular matrix of size
12000x12000/2 needs to be calculated. This is large and will often
result
in the out-of-memory error that you see. I don't often find that
clustering
that many genes is meaningful in any major way, particularly since you
will
be including a large number of genes that do not vary in the samples.
If
you really need to do this, I would suggest that you use an external
program
like cluster/treeview, as they may be somewhat less memory-hungry than
R
(but I haven't tested that directly).
> Is there some library in BioConductor that will allow me to output a
> heatmap. I want to compare the expression of the first 10 samples
with the
> last 10 samples.
If you want to do an unsupervised clustering of samples, use just
hclust.
If you want to do an unsupervised clustering of samples AND genes, I
would
suggest reducing the number of genes using a filter for genes that
show
variability (by using, say, the top 500 genes when sorted by
coefficient of
variation, for example). In other words, there is no need to include
a gene
in a heatmap that is the same for all samples.
Ultimately, though, if you want to compare gene expression in two
groups of
samples, you are asking a question that is best answered using a
supervised
method, like a t-test. There are numerous ways to do a t-test between
two
groups including the limma, siggenes, and multtest packages.
Hope that helps.
Sean
Hi,
For large data sets, hcluster will requires twice less memory than
hclust (package amap).
For even larger data sets, you can use xcluster program from Gavin
Sherlock
http://genetics. stanford.edu/~sherlock/cluster.html
Package ctc has all tools dialog with this [free] software.
And for visualization, I recommend TreeView or Freeview
http://magix.fri.uni-lj.si/freeview
But exploration on very large tree should be analysed carefully as
each branch could be switch with another one like that:
--- A == --- A
+- B +- C
+ C + B
Regards,
Antoine Lucas.
Le Mon, 13 Mar 2006 22:22:53 -0500
Sean Davis <sdavis2 at="" mail.nih.gov=""> a ?crit:
>
>
>
> On 3/13/06 21:08, "mark salsburg" <mark.salsburg at="" gmail.com="">
wrote:
>
> > I am having trouble getting the function heatmap() to work on the
following
> > gene expression
> >
> >> dim(SAMPLES_log)
> > [1] 12626 20
> >
> >
> > sample1 sample2...................sample20
> > gen1
> > gen2
> > gen3
> > ....
> > gen12626
> >
> >
> >
> > I have converted SAMPLES_log to a numeric matrix using:
> >
> > as.matrix(SAMPLES_log)
> >
> > when I use the following command:
> >
> > heatmap(SAMPLES_log)
> >
> > Error: cannot allocate vector of size 622668 Kb
> > In addition: Warning messages:
> > 1: Reached total allocation of 1022Mb: see help(memory.size)
> > 2: Reached total allocation of 1022Mb: see help(memory.size)
>
> Mark,
>
> In order to do a heatmap on 12000 genes, a triangular matrix of size
> 12000x12000/2 needs to be calculated. This is large and will often
result
> in the out-of-memory error that you see. I don't often find that
clustering
> that many genes is meaningful in any major way, particularly since
you will
> be including a large number of genes that do not vary in the
samples. If
> you really need to do this, I would suggest that you use an external
program
> like cluster/treeview, as they may be somewhat less memory-hungry
than R
> (but I haven't tested that directly).
>
> > Is there some library in BioConductor that will allow me to output
a
> > heatmap. I want to compare the expression of the first 10 samples
with the
> > last 10 samples.
>
> If you want to do an unsupervised clustering of samples, use just
hclust.
>
> If you want to do an unsupervised clustering of samples AND genes, I
would
> suggest reducing the number of genes using a filter for genes that
show
> variability (by using, say, the top 500 genes when sorted by
coefficient of
> variation, for example). In other words, there is no need to
include a gene
> in a heatmap that is the same for all samples.
>
> Ultimately, though, if you want to compare gene expression in two
groups of
> samples, you are asking a question that is best answered using a
supervised
> method, like a t-test. There are numerous ways to do a t-test
between two
> groups including the limma, siggenes, and multtest packages.
>
> Hope that helps.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Antoine Lucas
Centre de g?n?tique Mol?culaire, CNRS
91198 Gif sur Yvette Cedex
Tel: (33)1 69 82 38 89
Fax: (33)1 69 82 38 77
On Monday 13 March 2006 21:08, mark salsburg wrote:
> I am having trouble getting the function heatmap() to work on the
following
> gene expression
>
> > dim(SAMPLES_log)
>
> [1] 12626 20
>
>
> sample1 sample2...................sample20
> gen1
> gen2
> gen3
> ....
> gen12626
>
>
>
> I have converted SAMPLES_log to a numeric matrix using:
>
> as.matrix(SAMPLES_log)
>
> when I use the following command:
>
> heatmap(SAMPLES_log)
>
> Error: cannot allocate vector of size 622668 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 1022Mb: see help(memory.size)
> 2: Reached total allocation of 1022Mb: see help(memory.size)
>
>
>
> Is there some library in BioConductor that will allow me to output a
> heatmap. I want to compare the expression of the first 10 samples
with the
> last 10 samples.
>
> I have tried running that command in a Linux environment, also with
no
> success
>
> thank you,
>
> [[alternative HTML version deleted]]
Mark, along with the good stuff Sean Davis mentioned, maybe you could
think
about upgrading your computer hardware in the near future. You can
get
hardware that supports 64-bit memory addressing and put in 4 GB RAM,
all for
about $3k. That's relatively little compared to what it costs to run
20
chips. fwiw, I've compared a 32-bit system against a 64-bit system
(both
with 4 GB RAM), and can heartily recommend just going straight for a
64-bit
system (hardware _and_ operating system); just fewer headaches.
The number of chips you run will probably only increase during the
next
several years and, as you've discovered, lack of system resources can
make
you lose quite a lot of valuable time.
Best of luck,
jon butchar