Overlapping genes in subsets of lists
3
0
Entering edit mode
@heike-pospisil-1097
Last seen 10.2 years ago
Hello there, I have 100 lists of differentially expressed genes, and I am trying to find genes overrepresented in these 100 lists (I call them a 'cluster of genes'). What's worse, I expect not only one cluster of genes, but three or four or five of them. That is why, a simple intersection() will not help. I wish to had a function that can select all genes which appear in 100% of 33 lists of genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope my explanation is clear). Does anybody know a package or a strategy how to define such clusters? Thanks and best, Heike -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312
• 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States
"Sean Davis" <sdavis2 at="" mail.nih.gov=""> writes: > On Wed, Oct 8, 2008 at 8:34 AM, Heike Pospisil > <pospisil at="" zbh.uni-hamburg.de=""> wrote: >> Hello there, >> >> I have 100 lists of differentially expressed genes, and I am trying to find >> genes overrepresented in these 100 lists (I call them a 'cluster of genes'). >> What's worse, I expect not only one cluster of genes, but three or four or >> five of them. That is why, a simple intersection() will not help. I wish to >> had a function that can select all genes which appear in 100% of 33 lists of >> genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and >> all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope >> my explanation is clear). >> >> Does anybody know a package or a strategy how to define such clusters? > > Just a thought, but you could make a matrix with "gene lists" as the > columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) > and rows with the union of all genes. Put a "1" in each cell for a > gene that is present in a gene list and "0" elsewhere. Once you have > this matrix, you can use normal clustering methods to look for > patterns. For example, you could produce a heatmap of these data and > look for blocks. One way of doing this might be... > library(GSEABase) > data(sample.ExpressionSet) > obj = sample.ExpressionSet > gs1 = GeneSet(obj[200:230,], setName="set1") > gs2 = GeneSet(obj[210:240,], setName="set2") > gs3 = GeneSet(obj[220:250,], setName="set3") > gsc = GeneSetCollection(gs1, gs2, gs3) > inc = incidence(gsc) > colnames(inc[,colSums(inc)==3]) [1] "31459_i_at" "31460_f_at" "31461_at" "31462_f_at" "31463_s_at" [6] "31464_at" "31465_g_at" "31466_at" "31467_at" "31468_f_at" [11] "31469_s_at" (if the gene sets are in a list 'lst', e.g., because they were created in an lapply, then > gsc = do.call("GeneSetCollection", lst) saves some typing / coordination). Martin > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Oct 8, 2008 at 8:34 AM, Heike Pospisil <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hello there, > > I have 100 lists of differentially expressed genes, and I am trying to find > genes overrepresented in these 100 lists (I call them a 'cluster of genes'). > What's worse, I expect not only one cluster of genes, but three or four or > five of them. That is why, a simple intersection() will not help. I wish to > had a function that can select all genes which appear in 100% of 33 lists of > genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and > all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope > my explanation is clear). > > Does anybody know a package or a strategy how to define such clusters? Just a thought, but you could make a matrix with "gene lists" as the columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) and rows with the union of all genes. Put a "1" in each cell for a gene that is present in a gene list and "0" elsewhere. Once you have this matrix, you can use normal clustering methods to look for patterns. For example, you could produce a heatmap of these data and look for blocks. Sean
ADD COMMENT
0
Entering edit mode
@thomas-hampton-2820
Last seen 10.2 years ago
I would use the table function in R, which will tell you how many times gene X appears. If you have 100 lists, the maximum frequency is 100, as long as you make each gene unique on any given list. Then you can sort by frequency to see which genes come up most often. Another approach I have used is to hierarchically cluster the the lists, which will tell you which gene lists have the most genes in common. Hope this helps, Tom On Oct 8, 2008, at 8:34 AM, Heike Pospisil wrote: > Hello there, > > I have 100 lists of differentially expressed genes, and I am trying > to find > genes overrepresented in these 100 lists (I call them a 'cluster of > genes'). > What's worse, I expect not only one cluster of genes, but three or > four or > five of them. That is why, a simple intersection() will not help. I > wish to > had a function that can select all genes which appear in 100% of 33 > lists of > genes (cluster 1), all genes which appear in 100% of 22 lists > (cluster 2) and > all genes which appear in 100% of the remaining 45 lists (cluster > 3). (I hope > my explanation is clear). > > Does anybody know a package or a strategy how to define such clusters? > > Thanks and best, > Heike > -- > Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de > University of Hamburg | Center for Bioinformatics > Bundesstrasse 43 | 20146 Hamburg, Germany > phone:+49-40-42838-7303 | fax: +49-40-42838-7312 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Thomas, I guess, the latter approach will work. I am going to test this. Thanks to you (and Sean who suggested a similar approach), Heike On Wednesday 08 October 2008 15:23, Thomas Hampton wrote: > I would use the table function in R, which will tell you how many > times gene X appears. If you have 100 lists, the maximum frequency > is 100, as long as you make each gene unique on any given list. > > Then you can sort by frequency to see which genes come up most often. > > Another approach I have used is to hierarchically cluster the the lists, > which will tell you which gene lists have the most genes in common. > > Hope this helps, > > Tom > > On Oct 8, 2008, at 8:34 AM, Heike Pospisil wrote: > > Hello there, > > > > I have 100 lists of differentially expressed genes, and I am trying > > to find > > genes overrepresented in these 100 lists (I call them a 'cluster of > > genes'). > > What's worse, I expect not only one cluster of genes, but three or > > four or > > five of them. That is why, a simple intersection() will not help. I > > wish to > > had a function that can select all genes which appear in 100% of 33 > > lists of > > genes (cluster 1), all genes which appear in 100% of 22 lists > > (cluster 2) and > > all genes which appear in 100% of the remaining 45 lists (cluster > > 3). (I hope > > my explanation is clear). > > > > Does anybody know a package or a strategy how to define such clusters? > > > > Thanks and best, > > Heike > > -- > > Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de > > University of Hamburg | Center for Bioinformatics > > Bundesstrasse 43 | 20146 Hamburg, Germany > > phone:+49-40-42838-7303 | fax: +49-40-42838-7312 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312
ADD REPLY

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6