Clustering and gene modules

0

Entering edit mode

Min-Han Tan ▴ 40

@min-han-tan-1063

Last seen 10.6 years ago

New Year greetings to all. I have a problem which I am not sure how best to solve, and hope to seek advice from the list. I have 200 oligonucleotide arrays of about 13000 transcripts, belonging to approximately 6 different cancer subtypes. Essentially, I am hoping to first identify "gene modules" of gene expression corresponding to a specific cancer subtype, or groups of subtypes. (e.g. present only in A and B cancer, but not in C, D, E or F). Subsequently, I wish to label these modules by gene ontology. (e.g. "T-cell response" module) I tried a non-R program (GenXpress) which has been used to publish work in Nature Genetics, but I ran into quite a few freezes and glitches with the online cancer data posted alongside the program (not sure if it's a Windows issue on my side). I was thinking of first filtering the transcripts by variation and minimum expression, performing hierarchical clustering for the final gene set, choosing gene clusters by a minimum cluster size e.g. 20 transcripts, sifting through these clusters to find "modules" by identifying subclusters differentiating between various permutations of cancer A, B, C, D, E and F to a minimum significance value, and then using the package gocluster to identify the relevant annotations for each of these clusters. Any advice would be greatly appreciated. Thank you! Regards, Min-Han Tan Van Andel Institute, MI

Genetics Clustering Cancer goCluster Genetics Clustering Cancer goCluster • 1.6k views

ADD COMMENT • link updated 20.3 years ago by Sean Davis 21k • written 20.3 years ago by Min-Han Tan ▴ 40

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 weeks ago

United States

Why not look for differentially expressed genes between groups using Limma or some other package? Then, characterize the sets of differentially expressed genes using gene ontology using a package like GOstats and GOHyperG? This sounds like a more "traditional" analysis than what you are proposing. Is there a reason not to look for statistically differentially expressed genes? Sean ----- Original Message ----- From: "Min-Han Tan" <minhan.science@gmail.com> To: <bioconductor@stat.math.ethz.ch> Sent: Saturday, January 01, 2005 6:16 PM Subject: [BioC] Clustering and gene modules > New Year greetings to all. > > I have a problem which I am not sure how best to solve, and hope to > seek advice from the list. > > I have 200 oligonucleotide arrays of about 13000 transcripts, > belonging to approximately 6 different cancer subtypes. Essentially, I > am hoping to first identify "gene modules" of gene expression > corresponding to a specific cancer subtype, or groups of subtypes. > (e.g. present only in A and B cancer, but not in C, D, E or F). > Subsequently, I wish to label these modules by gene ontology. (e.g. > "T-cell response" module) > > I tried a non-R program (GenXpress) which has been used to publish > work in Nature Genetics, but I ran into quite a few freezes and > glitches with the online cancer data posted alongside the program (not > sure if it's a Windows issue on my side). > > I was thinking of first filtering the transcripts by variation and > minimum expression, performing hierarchical clustering for the final > gene set, choosing gene clusters by a minimum cluster size e.g. 20 > transcripts, sifting through these clusters to find "modules" by > identifying subclusters differentiating between various permutations > of cancer A, B, C, D, E and F to a minimum significance value, and > then using the package gocluster to identify the relevant annotations > for each of these clusters. > > Any advice would be greatly appreciated. Thank you! > > Regards, > Min-Han Tan > Van Andel Institute, MI > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.3 years ago Sean Davis 21k

0

Entering edit mode

Thanks for the reply. Using a package to generate differentially expressed genes is possible, but it makes for a large number of preliminary gene lists (with 6 subtypes, 41). Dividing the lists into up and downexpressed doubles the number of lists to go through. But I certainly agree that this is likely to generate useful results. However, I wonder if is it possible (non-statistician here) that clusters of genes, insignificant by themselves individually, may actually be significant in differentiating subtypes as a group? If so, a strategy based on differential expression may leave out clusters of genes that may actually be useful. For my data, I was hoping ontological labelling of gene clustering (only clusters significantly differentiating subtypes) would be a viable strategy, since there seem to be some work in this direction (goCluster and GeneXpress). I conceptualized the standard heatmap: heatmaps depict gene clusters (with the usual clumps of red and blue/green), some of these clumps, (or "modules") are clearly shared across subtypes and some are unique to particular subtypes. Some clusters of course are non-subtype specific (e.g. genes predicting gender). The working strategy (which is certainly imperfect) is that these clumps of genes mean something since they are co-expressed. Thank you! I appreciate your advice on this. Min-han On Sat, 1 Jan 2005 19:14:42 -0500, Sean Davis <sdavis2@mail.nih.gov> wrote: > Why not look for differentially expressed genes between groups using Limma > or some other package? Then, characterize the sets of differentially > expressed genes using gene ontology using a package like GOstats and > GOHyperG? This sounds like a more "traditional" analysis than what you are > proposing. Is there a reason not to look for statistically differentially > expressed genes? > > Sean > > ----- Original Message ----- > From: "Min-Han Tan" <minhan.science@gmail.com> > To: <bioconductor@stat.math.ethz.ch> > Sent: Saturday, January 01, 2005 6:16 PM > Subject: [BioC] Clustering and gene modules > > > New Year greetings to all. > > > > I have a problem which I am not sure how best to solve, and hope to > > seek advice from the list. > > > > I have 200 oligonucleotide arrays of about 13000 transcripts, > > belonging to approximately 6 different cancer subtypes. Essentially, I > > am hoping to first identify "gene modules" of gene expression > > corresponding to a specific cancer subtype, or groups of subtypes. > > (e.g. present only in A and B cancer, but not in C, D, E or F). > > Subsequently, I wish to label these modules by gene ontology. (e.g. > > "T-cell response" module) > > > > I tried a non-R program (GenXpress) which has been used to publish > > work in Nature Genetics, but I ran into quite a few freezes and > > glitches with the online cancer data posted alongside the program (not > > sure if it's a Windows issue on my side). > > > > I was thinking of first filtering the transcripts by variation and > > minimum expression, performing hierarchical clustering for the final > > gene set, choosing gene clusters by a minimum cluster size e.g. 20 > > transcripts, sifting through these clusters to find "modules" by > > identifying subclusters differentiating between various permutations > > of cancer A, B, C, D, E and F to a minimum significance value, and > > then using the package gocluster to identify the relevant annotations > > for each of these clusters. > > > > Any advice would be greatly appreciated. Thank you! > > > > Regards, > > Min-Han Tan > > Van Andel Institute, MI > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >

ADD REPLY • link 20.3 years ago Min-Han Tan ▴ 40

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 weeks ago

United States

----- Original Message ----- From: "Min-Han Tan" <minhan.science@gmail.com> To: <bioconductor@stat.math.ethz.ch>; <sdavis2@mail.nih.gov> Sent: Saturday, January 01, 2005 8:03 PM Subject: Re: [BioC] Clustering and gene modules > Thanks for the reply. > > Using a package to generate differentially expressed genes is > possible, but it makes for a large number of preliminary gene lists > (with 6 subtypes, 41). Dividing the lists into up and downexpressed > doubles the number of lists to go through. But I certainly agree that > this is likely to generate useful results. You might want to think about using F-statistics (or one-way anova). This allows one to look for differential expression in a general sense across groups. You can then use something like Limma's decideTests to determine which genes belong to which tumors. If you pick genes based on the highest f-stats, you will likely end up with your envisioned patchwork of genes upregulated in one or several groups. You can then look at each of those clusters. > However, I wonder if is it possible (non-statistician here) that > clusters of genes, insignificant by themselves individually, may > actually be significant in differentiating subtypes as a group? If so, > a strategy based on differential expression may leave out clusters of > genes that may actually be useful. There is no question that this can be the case. Solutions that deals naturally with this issue are the many different methods for doing classification. Some classification techniques will allow you to determine the "weight" of the genes that contribute to the classification. Classification tries to determine a gene or group of genes that best distinguish classes from each other. Note that this is NOT the same set of genes that you find when looking for differential expression (although there will often be a good deal of overlap). > For my data, I was hoping ontological labelling of gene clustering > (only clusters significantly differentiating subtypes) would be a > viable strategy, since there seem to be some work in this direction > (goCluster and GeneXpress). I conceptualized the standard heatmap: > heatmaps depict gene clusters (with the usual clumps of red and > blue/green), some of these clumps, (or "modules") are clearly shared > across subtypes and some are unique to particular subtypes. Some > clusters of course are non-subtype specific (e.g. genes predicting > gender). The working strategy (which is certainly imperfect) is that > these clumps of genes mean something since they are co-expressed. I haven't used goCluster, so I'm not sure where it fits above. You are probably quite right and my notes above are meant to point out two "standard" techniques for determining genes that characterize sample classes. Let's see what other input you get.... Sean

ADD COMMENT • link 20.3 years ago Sean Davis 21k

Login before adding your answer.