GO term enrichment

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 9 weeks ago

Germany

Hello everybody, I have a table with some microarray experiments which look like that: my "genelist.txt" Probe Id NAME FC_set1 FC_set2 FC_set3 FC_set4 A_51_P100021 Hivep3 1.048368 -1.085207 -1.013457 1.032816 A_51_P100034 Mif4gd -1.049719 -1.077773 -1.084012 -1.004941 A_51_P100052 Slitrk2 1.339832 1.063053 -1.157675 -1.003128 A_51_P100063 Lnx1 1.073604 1.010892 -1.058375 1.063377 A_51_P100084 Unknown 1.084544 -1.258876 -1.092571 -1.058791 ... the Probe Ids are from the Agilent expression arrays. I extracted the names using BiomaRt and now I would like to find whether there are some overrepresented gene sets in the differentially regulated genes. For once I would like to see if there are any GO terms which are overrepresented in these gene lists for each of the columns (gene sets). Secondly i would like to search for accumulations of other gene sets of differentially regulated genes in these lists (for example kinases, transcription factors, but also localization, protein domain etc.) I would like your help in creating the gene sets of either GO terms or the other parameters. I know I can extract the data from BiomaRt to each and every gen. for example: mart <- useMart("ensembl") ensembl <- useDataset("mmusculus_gene_ensembl", mart = mart) test <- read.delim("genelist.txt") geneset1 <- read.delim("geneset1_all_signal.txt") genes <- as.character(geneset1[,1]) geneNames <- getBM(attributes = c("go_biological_process_id", "name_1006", "agilent_wholegenome", "external_gene_id", "ensembl_gene_id", "entrezgene"), filter = c("agilent_wholegenome"), values = geneset1, mart = ensembl) > geneNames go_biological_process_id name_1006 1 GO:0007409 axonogenesis 2 GO:0006511 ubiquitin- dependent protein catabolic process 3 GO:0051260 protein homooligomerization 4 GO:0042787 protein ubiquitination during ubiquitin- dependent protein catabolic process 5 GO:0006417 regulation of translation 6 GO:0016070 RNA metabolic process 7 GO:0016070 RNA metabolic process agilent_wholegenome external_gene_id ensembl_gene_id entrezgene 1 A_51_P100052 Slitrk2 ENSMUSG00000036790 245450 2 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 3 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 4 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 5 A_51_P100034 Mif4gd ENSMUSG00000020743 69674 6 A_51_P100034 Mif4gd ENSMUSG00000020743 69674 7 A_51_P100034 Mif4gd ENSMUSG00000020743 NA Now I would like to create the gene sets according to these GO categories. I would like to get something like that: GO:0007409 A_51_P100052 ... the rest of the genes from this category in the list on one line GO:0016070 A_51_P100034 ... GO:0006417 A_51_P100034 ... THX for the help Assa [[alternative HTML version deleted]]

Microarray GO probe biomaRt Microarray GO probe biomaRt • 1.3k views

ADD COMMENT • link updated 14.8 years ago by Marc Carlson ★ 7.2k • written 14.8 years ago by Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Michael Imbeault ▴ 220

@michael-imbeault-3593

Last seen 10.6 years ago

Hello, Did you try to just use DAVID as an alternative to BioC? http://david.abcc.ncifcrf.gov/summary.jsp It can do overrepresentation analysis for GO and much more, you should check it out. Just paste your list of gene, choose Agilent gene ID and you`re pretty much set. Of course, it`s a web interface and that doesn't help you if you want this as part of a bioC pipeline. Cheers, Michael On 01/07/2010 5:58 AM, Assa Yeroslaviz wrote: > 00029228 16924 > 3 A_51_P100063 Lnx1 ENSMUSG000

ADD COMMENT • link 14.8 years ago Michael Imbeault ▴ 220

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 8.7 years ago

United States

Hi Assa, It really sounds like you should look at the vignette titled "Hypergeometric Tests Using GOstats" from using the GOstats package. http://www.bioconductor.org/packages/release/bioc/html/GOstats.html Marc On 07/01/2010 02:58 AM, Assa Yeroslaviz wrote: > Hello everybody, > > I have a table with some microarray experiments which look like that: > my "genelist.txt" > Probe Id NAME FC_set1 FC_set2 FC_set3 FC_set4 > A_51_P100021 Hivep3 1.048368 -1.085207 -1.013457 1.032816 > A_51_P100034 Mif4gd -1.049719 -1.077773 -1.084012 -1.004941 > A_51_P100052 Slitrk2 1.339832 1.063053 -1.157675 -1.003128 > A_51_P100063 Lnx1 1.073604 1.010892 -1.058375 1.063377 > A_51_P100084 Unknown 1.084544 -1.258876 -1.092571 -1.058791 > ... > > the Probe Ids are from the Agilent expression arrays. I extracted the names > using BiomaRt and now I would like to find whether there are some > overrepresented gene sets in the differentially regulated genes. > For once I would like to see if there are any GO terms which are > overrepresented in these gene lists for each of the columns (gene sets). > Secondly i would like to search for accumulations of other gene sets of > differentially regulated genes in these lists (for example kinases, > transcription factors, but also localization, protein domain etc.) > > I would like your help in creating the gene sets of either GO terms or the > other parameters. > > I know I can extract the data from BiomaRt to each and every gen. for > example: > > mart <- useMart("ensembl") > ensembl <- useDataset("mmusculus_gene_ensembl", mart = mart) > > test <- read.delim("genelist.txt") > geneset1 <- read.delim("geneset1_all_signal.txt") > genes <- as.character(geneset1[,1]) > > geneNames <- getBM(attributes = c("go_biological_process_id", "name_1006", > "agilent_wholegenome", "external_gene_id", "ensembl_gene_id", "entrezgene"), > filter = c("agilent_wholegenome"), values = geneset1, mart = ensembl) > > >> geneNames >> > go_biological_process_id > name_1006 > 1 > GO:0007409 > axonogenesis > 2 GO:0006511 ubiquitin- dependent > protein catabolic process > 3 GO:0051260 > protein homooligomerization > 4 GO:0042787 protein ubiquitination during ubiquitin- dependent > protein catabolic process > 5 GO:0006417 > regulation of translation > 6 > GO:0016070 RNA > metabolic process > 7 > GO:0016070 RNA > metabolic process > agilent_wholegenome external_gene_id ensembl_gene_id entrezgene > 1 A_51_P100052 Slitrk2 ENSMUSG00000036790 245450 > 2 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 > 3 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 > 4 A_51_P100063 Lnx1 ENSMUSG00000029228 16924 > 5 A_51_P100034 Mif4gd ENSMUSG00000020743 69674 > 6 A_51_P100034 Mif4gd ENSMUSG00000020743 69674 > 7 A_51_P100034 Mif4gd ENSMUSG00000020743 NA > > Now I would like to create the gene sets according to these GO categories. I > would like to get something like that: > > GO:0007409 A_51_P100052 ... the rest of the genes from this category in the > list on one line > GO:0016070 A_51_P100034 ... > GO:0006417 A_51_P100034 ... > > THX for the help > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 14.8 years ago Marc Carlson ★ 7.2k

Login before adding your answer.