Help using ENSMUSG ids in GOstats
1
0
Entering edit mode
John Reid ▴ 30
@john-reid-2792
Last seen 10.2 years ago
Hi, I have some sets of genes I wish to test for enrichment against a background set of genes. These genes are identified by Ensembl identifiers. I have found it quite straightforward to use the topGO package to do this. I would also like to use the GOstats package and test for enrichment against KEGG pathways. It looks as if I need to create an annotation package to do this. Is this straightforward? I didn't find out how to do it from the documentation. My guess is that someone has been in this situation before. I should say the genes come from a computational analysis and not from a microarray. Any help appreciated, John.
Microarray Pathways GOstats Microarray Pathways GOstats • 3.0k views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
John Reid wrote: > Hi, > > I have some sets of genes I wish to test for enrichment against a > background set of genes. These genes are identified by Ensembl > identifiers. I have found it quite straightforward to use the topGO > package to do this. > > I would also like to use the GOstats package and test for enrichment > against KEGG pathways. It looks as if I need to create an annotation > package to do this. Is this straightforward? I didn't find out how to do > it from the documentation. My guess is that someone has been in this Hi John, You don't give us much to work from. I am guessing you have not looked at either AnnotationDbi or AnnBuilder as packages that you can use to build an annotation package. I am also guessing you have not searched the email list archives for any of the several previous discussions (that is a good place to start). You don't tell us what organism, but I suspect that you could simply use one of the organism based annotation packages (without needing to build anything). There are several previous threads on how to do that. Best wishes Robert > situation before. I should say the genes come from a computational > analysis and not from a microarray. > > Any help appreciated, > John. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT
0
Entering edit mode
Robert Gentleman wrote: > > > I am guessing you have not looked at either AnnotationDbi or AnnBuilder > as packages that you can use to build an annotation package. I did look at AnnBuilder but the basic vignette suggested that I should look in the advanced vignette for what I would like to do. I didn't find the advanced vignette at that time. I shall have a look at AnnotationDbi. > > I am also guessing you have not searched the email list archives for > any of the several previous discussions (that is a good place to start). I did search the email list archives. Nothing came up. Can you suggest a good search term? > > You don't tell us what organism, but I suspect that you could simply > use one of the organism based annotation packages (without needing to > build anything). There are several previous threads on how to do that. ENSMUSG genes are mouse genes. Is there an annotation package that will work for me? How could I have found it without asking here? Which threads? From a wealth of packages on the annotation packages page it is very hard to determine which might be relevant. Thanks, John.
ADD REPLY
0
Entering edit mode
John Reid wrote: > > Robert Gentleman wrote: >> >> >> I am guessing you have not looked at either AnnotationDbi or >> AnnBuilder as packages that you can use to build an annotation package. > > I did look at AnnBuilder but the basic vignette suggested that I should > look in the advanced vignette for what I would like to do. I didn't find > the advanced vignette at that time. I shall have a look at AnnotationDbi. How did you not find it? It should have been included with the package and so one hopes not very hard to find. If you did look in specific locations you thought would contain the vignette, we would like to know where, so that we can try to either rectify the impression that something should be there, or if it is missing from some obvious places put it there. > > >> >> I am also guessing you have not searched the email list archives >> for any of the several previous discussions (that is a good place to >> start). > I did search the email list archives. Nothing came up. Can you suggest a > good search term? GOstats seems like a good starting place. Again, you seem not to want to say what you did search on, so I have no idea why nothing came up. The question has been asked quite a few times. > >> >> You don't tell us what organism, but I suspect that you could simply >> use one of the organism based annotation packages (without needing to >> build anything). There are several previous threads on how to do that. > ENSMUSG genes are mouse genes. Is there an annotation package that will > work for me? How could I have found it without asking here? Which > threads? From a wealth of packages on the annotation packages page it is > very hard to determine which might be relevant. Given that you have mouse genes, then I think you might be able to rule out most of the annotation packages. The BioC views let you select an organism, which greatly reduces the set you would need to look at. I get to this place with about 3 clicks from the top of the BioC page. http://www.bioconductor.org/packages/release/Mus_musculus.html And then since you don't have an array it seems unlikely that any of the array specific packages would be what you want. I hope with a few minutes work you would have ended up at org.Mm.eg.db, which you may be able to adapt to your needs. You may need some other tool (such as biomaRt) to map from what ever identifiers you are using to those in the annotation package (or they might be there already, again you haven't given us much of anything to work with). best wishes Robert > > > Thanks, > John. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Robert Gentleman wrote: > >>> I am also guessing you have not searched the email list archives >>> for any of the several previous discussions (that is a good place to >>> start). >> I did search the email list archives. Nothing came up. Can you >> suggest a good search term? > > GOstats seems like a good starting place. Again, you seem not to > want to say what you did search on, so I have no idea why nothing came > up. The question has been asked quite a few times. > I did search on GOstats, that certainly didn't help me find an annotation package. All the GOstats documentation says is that I need an annotation package. It does not help the user determine how to find the correct one. I'm not saying it should, just that this information is not easy to find anywhere else either. > > Given that you have mouse genes, then I think you might be able to > rule out most of the annotation packages. The BioC views let you > select an organism, which greatly reduces the set you would need to > look at. > I get to this place with about 3 clicks from the top of the BioC page. > > http://www.bioconductor.org/packages/release/Mus_musculus.html > > And then since you don't have an array it seems unlikely that any of > the array specific packages would be what you want. I hope with a few > minutes work you would have ended up at org.Mm.eg.db, which you may be > able to adapt to your needs. You may need some other tool (such as > biomaRt) to map from what ever identifiers you are using to those in > the annotation package (or they might be there already, again you > haven't given us much of anything to work with). I don't understand why you keep saying I haven't given you much to work with. The question surely is: Are ENSMUSG identifiers mapped in an annotation package so that I can use them in GOstats? This seemed clear to me in the first list post. Perhaps I have misunderstood some of the issues but at the moment I don't see what. Maybe you could enlighten me? I did end up at org.Mm.eg.Db myself also in a few clicks but it certainly doesn't use Ensembl identifiers, its description clearly states Entrez genes. So like you say I have extra work to do to map the identifiers. Thanks for the help, John.
ADD REPLY
0
Entering edit mode
Hi John, Perhaps this will help a bit. > library(org.Mm.eg.db) Loading required package: AnnotationDbi Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: DBI Loading required package: RSQLite > ls(2) [1] "org.Mm.eg" "org.Mm.eg_dbconn" "org.Mm.eg_dbfile" [4] "org.Mm.eg_dbInfo" "org.Mm.eg_dbschema" "org.Mm.egACCNUM" [7] "org.Mm.egACCNUM2EG" "org.Mm.egALIAS2EG" "org.Mm.egCHR" [10] "org.Mm.egCHRLENGTHS" "org.Mm.egCHRLOC" "org.Mm.egENSEMBL" [13] "org.Mm.egENSEMBL2EG" "org.Mm.egENZYME" "org.Mm.egENZYME2EG" [16] "org.Mm.egGENENAME" "org.Mm.egGO" "org.Mm.egGO2ALLEGS" [19] "org.Mm.egGO2EG" "org.Mm.egMAP" "org.Mm.egMAP2EG" [22] "org.Mm.egMAPCOUNTS" "org.Mm.egMGI" "org.Mm.egMGI2EG" [25] "org.Mm.egORGANISM" "org.Mm.egPATH" "org.Mm.egPATH2EG" [28] "org.Mm.egPFAM" "org.Mm.egPMID" "org.Mm.egPMID2EG" [31] "org.Mm.egPROSITE" "org.Mm.egREFSEQ" "org.Mm.egREFSEQ2EG" [34] "org.Mm.egSYMBOL" "org.Mm.egSYMBOL2EG" "org.Mm.egUNIGENE" [37] "org.Mm.egUNIGENE2EG" > ?org.Mm.egENSEMBL You will probably also need to make use of the revmap() function. If we assume here that you have a character vector of Ensembl IDs called ENSMUSG: gns <- mget(ENSMUSG, revmap(org.Mm.egENSEMBL)) will give you a list of Entrez Gene IDs. For GOstats you need to come up with a character vector of unique Entrez Gene IDs, so you may need to check for multiple Entrez Gene IDs for a particular Ensembl ID (no guarantee that there is a one-to-one mapping), and then get rid of duplicates (e.g., simply wrapping the above in unlist() is not likely what you want to do). The same holds true for the universe, which is the set of genes that could have been selected from your chip. Once you have those things, the procedure is quite straightforward. An example with fake data: First just get some random IDs: > gns <- unique(toTable(org.Mm.egENSEMBL)[1:100,1]) > univ <- unique(toTable(org.Mm.egENSEMBL)[1:1000,1]) Now do the analysis: > param <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ, ontology = "BP", annotation = "org.Mm.eg.db") > hyp <- hyperGTest(param) > head(summary(hyp)) GOBPID Pvalue OddsRatio GO:0007229 GO:0007229 9.168712e-11 107.987805 GO:0010033 GO:0010033 1.255989e-06 25.192157 GO:0042391 GO:0042391 6.797840e-06 9.590361 GO:0007166 GO:0007166 1.404809e-05 2.941145 GO:0007190 GO:0007190 5.915149e-05 45.738636 GO:0031279 GO:0031279 5.915149e-05 45.738636 ExpCount Count Size GO:0007229 1.2413793 11 12 GO:0010033 1.1379310 8 11 GO:0042391 2.0689655 10 20 GO:0007166 15.9310345 32 154 GO:0007190 0.6206897 5 6 GO:0031279 0.6206897 5 6 Term GO:0007229 integrin-mediated signaling pathway GO:0010033 response to organic substance GO:0042391 regulation of membrane potential GO:0007166 cell surface receptor linked signal transduction GO:0007190 activation of adenylate cyclase activity GO:0031279 regulation of cyclase activity Best, Jim John Reid wrote: > > > Robert Gentleman wrote: >> >>>> I am also guessing you have not searched the email list archives >>>> for any of the several previous discussions (that is a good place to >>>> start). >>> I did search the email list archives. Nothing came up. Can you >>> suggest a good search term? >> >> GOstats seems like a good starting place. Again, you seem not to >> want to say what you did search on, so I have no idea why nothing came >> up. The question has been asked quite a few times. >> > I did search on GOstats, that certainly didn't help me find an > annotation package. All the GOstats documentation says is that I need an > annotation package. It does not help the user determine how to find the > correct one. I'm not saying it should, just that this information is not > easy to find anywhere else either. >> >> Given that you have mouse genes, then I think you might be able to >> rule out most of the annotation packages. The BioC views let you >> select an organism, which greatly reduces the set you would need to >> look at. >> I get to this place with about 3 clicks from the top of the BioC page. >> >> http://www.bioconductor.org/packages/release/Mus_musculus.html >> >> And then since you don't have an array it seems unlikely that any of >> the array specific packages would be what you want. I hope with a few >> minutes work you would have ended up at org.Mm.eg.db, which you may be >> able to adapt to your needs. You may need some other tool (such as >> biomaRt) to map from what ever identifiers you are using to those in >> the annotation package (or they might be there already, again you >> haven't given us much of anything to work with). > > I don't understand why you keep saying I haven't given you much to work > with. The question surely is: Are ENSMUSG identifiers mapped in an > annotation package so that I can use them in GOstats? This seemed clear > to me in the first list post. Perhaps I have misunderstood some of the > issues but at the moment I don't see what. Maybe you could enlighten me? > > I did end up at org.Mm.eg.Db myself also in a few clicks but it > certainly doesn't use Ensembl identifiers, its description clearly > states Entrez genes. So like you say I have extra work to do to map the > identifiers. > > Thanks for the help, > John. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY

Login before adding your answer.

Traffic: 944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6