Overlapping genes in subsets of lists

0

Entering edit mode

Heike Pospisil ▴ 310

@heike-pospisil-1097

Last seen 10.2 years ago

Hello there, I have 100 lists of differentially expressed genes, and I am trying to find genes overrepresented in these 100 lists (I call them a 'cluster of genes'). What's worse, I expect not only one cluster of genes, but three or four or five of them. That is why, a simple intersection() will not help. I wish to had a function that can select all genes which appear in 100% of 33 lists of genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope my explanation is clear). Does anybody know a package or a strategy how to define such clusters? Thanks and best, Heike -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312

• 1.1k views

ADD COMMENT • link updated 16.1 years ago by Martin Morgan 25k • written 16.1 years ago by Heike Pospisil ▴ 310

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 4 months ago

United States

"Sean Davis" <sdavis2 at="" mail.nih.gov=""> writes: > On Wed, Oct 8, 2008 at 8:34 AM, Heike Pospisil > <pospisil at="" zbh.uni-hamburg.de=""> wrote: >> Hello there, >> >> I have 100 lists of differentially expressed genes, and I am trying to find >> genes overrepresented in these 100 lists (I call them a 'cluster of genes'). >> What's worse, I expect not only one cluster of genes, but three or four or >> five of them. That is why, a simple intersection() will not help. I wish to >> had a function that can select all genes which appear in 100% of 33 lists of >> genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and >> all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope >> my explanation is clear). >> >> Does anybody know a package or a strategy how to define such clusters? > > Just a thought, but you could make a matrix with "gene lists" as the > columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) > and rows with the union of all genes. Put a "1" in each cell for a > gene that is present in a gene list and "0" elsewhere. Once you have > this matrix, you can use normal clustering methods to look for > patterns. For example, you could produce a heatmap of these data and > look for blocks. One way of doing this might be... > library(GSEABase) > data(sample.ExpressionSet) > obj = sample.ExpressionSet > gs1 = GeneSet(obj[200:230,], setName="set1") > gs2 = GeneSet(obj[210:240,], setName="set2") > gs3 = GeneSet(obj[220:250,], setName="set3") > gsc = GeneSetCollection(gs1, gs2, gs3) > inc = incidence(gsc) > colnames(inc[,colSums(inc)==3]) [1] "31459_i_at" "31460_f_at" "31461_at" "31462_f_at" "31463_s_at" [6] "31464_at" "31465_g_at" "31466_at" "31467_at" "31468_f_at" [11] "31469_s_at" (if the gene sets are in a list 'lst', e.g., because they were created in an lapply, then > gsc = do.call("GeneSetCollection", lst) saves some typing / coordination). Martin > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793

ADD COMMENT • link 16.1 years ago Martin Morgan 25k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Wed, Oct 8, 2008 at 8:34 AM, Heike Pospisil <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hello there, > > I have 100 lists of differentially expressed genes, and I am trying to find > genes overrepresented in these 100 lists (I call them a 'cluster of genes'). > What's worse, I expect not only one cluster of genes, but three or four or > five of them. That is why, a simple intersection() will not help. I wish to > had a function that can select all genes which appear in 100% of 33 lists of > genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and > all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope > my explanation is clear). > > Does anybody know a package or a strategy how to define such clusters? Just a thought, but you could make a matrix with "gene lists" as the columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) and rows with the union of all genes. Put a "1" in each cell for a gene that is present in a gene list and "0" elsewhere. Once you have this matrix, you can use normal clustering methods to look for patterns. For example, you could produce a heatmap of these data and look for blocks. Sean

ADD COMMENT • link 16.1 years ago Sean Davis 21k

0

Entering edit mode

Thomas Hampton ▴ 750

@thomas-hampton-2820

Last seen 10.2 years ago

I would use the table function in R, which will tell you how many times gene X appears. If you have 100 lists, the maximum frequency is 100, as long as you make each gene unique on any given list. Then you can sort by frequency to see which genes come up most often. Another approach I have used is to hierarchically cluster the the lists, which will tell you which gene lists have the most genes in common. Hope this helps, Tom On Oct 8, 2008, at 8:34 AM, Heike Pospisil wrote: > Hello there, > > I have 100 lists of differentially expressed genes, and I am trying > to find > genes overrepresented in these 100 lists (I call them a 'cluster of > genes'). > What's worse, I expect not only one cluster of genes, but three or > four or > five of them. That is why, a simple intersection() will not help. I > wish to > had a function that can select all genes which appear in 100% of 33 > lists of > genes (cluster 1), all genes which appear in 100% of 22 lists > (cluster 2) and > all genes which appear in 100% of the remaining 45 lists (cluster > 3). (I hope > my explanation is clear). > > Does anybody know a package or a strategy how to define such clusters? > > Thanks and best, > Heike > -- > Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de > University of Hamburg | Center for Bioinformatics > Bundesstrasse 43 | 20146 Hamburg, Germany > phone:+49-40-42838-7303 | fax: +49-40-42838-7312 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.1 years ago Thomas Hampton ▴ 750

0

Entering edit mode

Hi Thomas, I guess, the latter approach will work. I am going to test this. Thanks to you (and Sean who suggested a similar approach), Heike On Wednesday 08 October 2008 15:23, Thomas Hampton wrote: > I would use the table function in R, which will tell you how many > times gene X appears. If you have 100 lists, the maximum frequency > is 100, as long as you make each gene unique on any given list. > > Then you can sort by frequency to see which genes come up most often. > > Another approach I have used is to hierarchically cluster the the lists, > which will tell you which gene lists have the most genes in common. > > Hope this helps, > > Tom > > On Oct 8, 2008, at 8:34 AM, Heike Pospisil wrote: > > Hello there, > > > > I have 100 lists of differentially expressed genes, and I am trying > > to find > > genes overrepresented in these 100 lists (I call them a 'cluster of > > genes'). > > What's worse, I expect not only one cluster of genes, but three or > > four or > > five of them. That is why, a simple intersection() will not help. I > > wish to > > had a function that can select all genes which appear in 100% of 33 > > lists of > > genes (cluster 1), all genes which appear in 100% of 22 lists > > (cluster 2) and > > all genes which appear in 100% of the remaining 45 lists (cluster > > 3). (I hope > > my explanation is clear). > > > > Does anybody know a package or a strategy how to define such clusters? > > > > Thanks and best, > > Heike > > -- > > Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de > > University of Hamburg | Center for Bioinformatics > > Bundesstrasse 43 | 20146 Hamburg, Germany > > phone:+49-40-42838-7303 | fax: +49-40-42838-7312 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312

ADD REPLY • link 16.1 years ago Heike Pospisil ▴ 310

Login before adding your answer.