New Year greetings to all.
I have a problem which I am not sure how best to solve, and hope to
seek advice from the list.
I have 200 oligonucleotide arrays of about 13000 transcripts,
belonging to approximately 6 different cancer subtypes. Essentially, I
am hoping to first identify "gene modules" of gene expression
corresponding to a specific cancer subtype, or groups of subtypes.
(e.g. present only in A and B cancer, but not in C, D, E or F).
Subsequently, I wish to label these modules by gene ontology. (e.g.
"T-cell response" module)
I tried a non-R program (GenXpress) which has been used to publish
work in Nature Genetics, but I ran into quite a few freezes and
glitches with the online cancer data posted alongside the program (not
sure if it's a Windows issue on my side).
I was thinking of first filtering the transcripts by variation and
minimum expression, performing hierarchical clustering for the final
gene set, choosing gene clusters by a minimum cluster size e.g. 20
transcripts, sifting through these clusters to find "modules" by
identifying subclusters differentiating between various permutations
of cancer A, B, C, D, E and F to a minimum significance value, and
then using the package gocluster to identify the relevant annotations
for each of these clusters.
Any advice would be greatly appreciated. Thank you!
Regards,
Min-Han Tan
Van Andel Institute, MI
Why not look for differentially expressed genes between groups using
Limma
or some other package? Then, characterize the sets of differentially
expressed genes using gene ontology using a package like GOstats and
GOHyperG? This sounds like a more "traditional" analysis than what
you are
proposing. Is there a reason not to look for statistically
differentially
expressed genes?
Sean
----- Original Message -----
From: "Min-Han Tan" <minhan.science@gmail.com>
To: <bioconductor@stat.math.ethz.ch>
Sent: Saturday, January 01, 2005 6:16 PM
Subject: [BioC] Clustering and gene modules
> New Year greetings to all.
>
> I have a problem which I am not sure how best to solve, and hope to
> seek advice from the list.
>
> I have 200 oligonucleotide arrays of about 13000 transcripts,
> belonging to approximately 6 different cancer subtypes. Essentially,
I
> am hoping to first identify "gene modules" of gene expression
> corresponding to a specific cancer subtype, or groups of subtypes.
> (e.g. present only in A and B cancer, but not in C, D, E or F).
> Subsequently, I wish to label these modules by gene ontology. (e.g.
> "T-cell response" module)
>
> I tried a non-R program (GenXpress) which has been used to publish
> work in Nature Genetics, but I ran into quite a few freezes and
> glitches with the online cancer data posted alongside the program
(not
> sure if it's a Windows issue on my side).
>
> I was thinking of first filtering the transcripts by variation and
> minimum expression, performing hierarchical clustering for the final
> gene set, choosing gene clusters by a minimum cluster size e.g. 20
> transcripts, sifting through these clusters to find "modules" by
> identifying subclusters differentiating between various permutations
> of cancer A, B, C, D, E and F to a minimum significance value, and
> then using the package gocluster to identify the relevant
annotations
> for each of these clusters.
>
> Any advice would be greatly appreciated. Thank you!
>
> Regards,
> Min-Han Tan
> Van Andel Institute, MI
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
Thanks for the reply.
Using a package to generate differentially expressed genes is
possible, but it makes for a large number of preliminary gene lists
(with 6 subtypes, 41). Dividing the lists into up and downexpressed
doubles the number of lists to go through. But I certainly agree that
this is likely to generate useful results.
However, I wonder if is it possible (non-statistician here) that
clusters of genes, insignificant by themselves individually, may
actually be significant in differentiating subtypes as a group? If so,
a strategy based on differential expression may leave out clusters of
genes that may actually be useful.
For my data, I was hoping ontological labelling of gene clustering
(only clusters significantly differentiating subtypes) would be a
viable strategy, since there seem to be some work in this direction
(goCluster and GeneXpress). I conceptualized the standard heatmap:
heatmaps depict gene clusters (with the usual clumps of red and
blue/green), some of these clumps, (or "modules") are clearly shared
across subtypes and some are unique to particular subtypes. Some
clusters of course are non-subtype specific (e.g. genes predicting
gender). The working strategy (which is certainly imperfect) is that
these clumps of genes mean something since they are co-expressed.
Thank you! I appreciate your advice on this.
Min-han
On Sat, 1 Jan 2005 19:14:42 -0500, Sean Davis <sdavis2@mail.nih.gov>
wrote:
> Why not look for differentially expressed genes between groups using
Limma
> or some other package? Then, characterize the sets of
differentially
> expressed genes using gene ontology using a package like GOstats and
> GOHyperG? This sounds like a more "traditional" analysis than what
you are
> proposing. Is there a reason not to look for statistically
differentially
> expressed genes?
>
> Sean
>
> ----- Original Message -----
> From: "Min-Han Tan" <minhan.science@gmail.com>
> To: <bioconductor@stat.math.ethz.ch>
> Sent: Saturday, January 01, 2005 6:16 PM
> Subject: [BioC] Clustering and gene modules
>
> > New Year greetings to all.
> >
> > I have a problem which I am not sure how best to solve, and hope
to
> > seek advice from the list.
> >
> > I have 200 oligonucleotide arrays of about 13000 transcripts,
> > belonging to approximately 6 different cancer subtypes.
Essentially, I
> > am hoping to first identify "gene modules" of gene expression
> > corresponding to a specific cancer subtype, or groups of subtypes.
> > (e.g. present only in A and B cancer, but not in C, D, E or F).
> > Subsequently, I wish to label these modules by gene ontology.
(e.g.
> > "T-cell response" module)
> >
> > I tried a non-R program (GenXpress) which has been used to publish
> > work in Nature Genetics, but I ran into quite a few freezes and
> > glitches with the online cancer data posted alongside the program
(not
> > sure if it's a Windows issue on my side).
> >
> > I was thinking of first filtering the transcripts by variation and
> > minimum expression, performing hierarchical clustering for the
final
> > gene set, choosing gene clusters by a minimum cluster size e.g. 20
> > transcripts, sifting through these clusters to find "modules" by
> > identifying subclusters differentiating between various
permutations
> > of cancer A, B, C, D, E and F to a minimum significance value, and
> > then using the package gocluster to identify the relevant
annotations
> > for each of these clusters.
> >
> > Any advice would be greatly appreciated. Thank you!
> >
> > Regards,
> > Min-Han Tan
> > Van Andel Institute, MI
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>
----- Original Message -----
From: "Min-Han Tan" <minhan.science@gmail.com>
To: <bioconductor@stat.math.ethz.ch>; <sdavis2@mail.nih.gov>
Sent: Saturday, January 01, 2005 8:03 PM
Subject: Re: [BioC] Clustering and gene modules
> Thanks for the reply.
>
> Using a package to generate differentially expressed genes is
> possible, but it makes for a large number of preliminary gene lists
> (with 6 subtypes, 41). Dividing the lists into up and downexpressed
> doubles the number of lists to go through. But I certainly agree
that
> this is likely to generate useful results.
You might want to think about using F-statistics (or one-way anova).
This
allows one to look for differential expression in a general sense
across
groups. You can then use something like Limma's decideTests to
determine
which genes belong to which tumors. If you pick genes based on the
highest
f-stats, you will likely end up with your envisioned patchwork of
genes
upregulated in one or several groups. You can then look at each of
those
clusters.
> However, I wonder if is it possible (non-statistician here) that
> clusters of genes, insignificant by themselves individually, may
> actually be significant in differentiating subtypes as a group? If
so,
> a strategy based on differential expression may leave out clusters
of
> genes that may actually be useful.
There is no question that this can be the case. Solutions that deals
naturally with this issue are the many different methods for doing
classification. Some classification techniques will allow you to
determine
the "weight" of the genes that contribute to the classification.
Classification tries to determine a gene or group of genes that best
distinguish classes from each other. Note that this is NOT the same
set of
genes that you find when looking for differential expression (although
there
will often be a good deal of overlap).
> For my data, I was hoping ontological labelling of gene clustering
> (only clusters significantly differentiating subtypes) would be a
> viable strategy, since there seem to be some work in this direction
> (goCluster and GeneXpress). I conceptualized the standard heatmap:
> heatmaps depict gene clusters (with the usual clumps of red and
> blue/green), some of these clumps, (or "modules") are clearly shared
> across subtypes and some are unique to particular subtypes. Some
> clusters of course are non-subtype specific (e.g. genes predicting
> gender). The working strategy (which is certainly imperfect) is that
> these clumps of genes mean something since they are co-expressed.
I haven't used goCluster, so I'm not sure where it fits above. You
are
probably quite right and my notes above are meant to point out two
"standard" techniques for determining genes that characterize sample
classes. Let's see what other input you get....
Sean