GO annotation

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 4.2 years ago

Finland

Dear Bioconductor community, Good day. I did the differential expression analysis for my RNA-Seq data with edgeR package. I have a list of differentially expressed genes now and I would like to find the GO terms for the genes. I have been reading and searching around for the right package. But, I found that several packages are developed based on model species. Could the community kindly please suggest me what GO annotation package I can use for non-model species; plant RNA-Seq data? Thank you very much and have a nice weekend. Best regards, KJ Lim [[alternative HTML version deleted]]

Annotation GO Annotation GO • 1.3k views

ADD COMMENT • link updated 12.2 years ago by Marc Carlson ★ 7.2k • written 12.2 years ago by KJ Lim ▴ 420

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 8.3 years ago

United States

Hi Lim, First of all it all depends on what you have for gene identifiers. If you are like most people you will have entrez gene IDs. So for now I will assume you have those. ## So lets further assume you are working with humans and just choose the 1st two entrez gene IDs so that we can make a (hopefully meaningful) example ids = c("1","2") ## now load the org library for humans library(org.Hs.eg.db) ## then you can call select to extract your GO IDs like this: select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") Now one thing to notice is that if you have some other kind of identifier, then your keytype argument will have to be set to a different value. And hopefully, the kind of ID you are using, is present in the package that you have to search... See the manual page for select for more information. ?select From your question, I also recognize that you may not be able to do this because it sounds like you might be using a more unusual organism and not be using something commonplace like human. Well don't give up just yet, because we may be able to help you there too. You can look at the manual page for the function makeOrgPackageFromNCBI to learn how you can try to generate an org package from just the taxonomy ID (which you can look up on NCBIs website). If the data is available at NCBI, then you should be able to generate a package from NCBI that will match your organism of choice. ?makeOrgPackageFromNCBI Does that answer your question? Marc On 09/21/2012 02:35 AM, KJ Lim wrote: > Dear Bioconductor community, > > Good day. > > I did the differential expression analysis for my RNA-Seq data with edgeR > package. I have a list of differentially expressed genes now and I would > like to find the GO terms for the genes. > > I have been reading and searching around for the right package. But, I > found that several packages are developed based on model species. Could the > community kindly please suggest me what GO annotation package I can use for > non-model species; plant RNA-Seq data? > > Thank you very much and have a nice weekend. > > Best regards, > KJ Lim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.2 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Marc, Could you suggest a go-to literature reference on annotating genomic data using bioconductor packages, probably a book or any of its kind. Thanks ~Sathish -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Marc Carlson Sent: Friday, September 21, 2012 8:49 PM To: bioconductor at r-project.org Subject: Re: [BioC] GO annotation Hi Lim, First of all it all depends on what you have for gene identifiers. If you are like most people you will have entrez gene IDs. So for now I will assume you have those. ## So lets further assume you are working with humans and just choose the 1st two entrez gene IDs so that we can make a (hopefully meaningful) example ids = c("1","2") ## now load the org library for humans library(org.Hs.eg.db) ## then you can call select to extract your GO IDs like this: select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") Now one thing to notice is that if you have some other kind of identifier, then your keytype argument will have to be set to a different value. And hopefully, the kind of ID you are using, is present in the package that you have to search... See the manual page for select for more information. ?select From your question, I also recognize that you may not be able to do this because it sounds like you might be using a more unusual organism and not be using something commonplace like human. Well don't give up just yet, because we may be able to help you there too. You can look at the manual page for the function makeOrgPackageFromNCBI to learn how you can try to generate an org package from just the taxonomy ID (which you can look up on NCBIs website). If the data is available at NCBI, then you should be able to generate a package from NCBI that will match your organism of choice. ?makeOrgPackageFromNCBI Does that answer your question? Marc On 09/21/2012 02:35 AM, KJ Lim wrote: > Dear Bioconductor community, > > Good day. > > I did the differential expression analysis for my RNA-Seq data with > edgeR package. I have a list of differentially expressed genes now and > I would like to find the GO terms for the genes. > > I have been reading and searching around for the right package. But, I > found that several packages are developed based on model species. > Could the community kindly please suggest me what GO annotation > package I can use for non-model species; plant RNA-Seq data? > > Thank you very much and have a nice weekend. > > Best regards, > KJ Lim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.2 years ago Srinivasan, Sathish K ▴ 10

0

Entering edit mode

Dear Marc, Thanks for your prompt replied. Your information is helpful. You are right about the organism I'm working on. It is not a model species nor human samples, it is a gymnosperm plant species. I will have a look on the ?makeOrgPackageFromNCBI package from the manual, hopefully I have luck on this. It is a bit pity that many nice GO packages are not yet ready for non-model species, please correct me if I was wrong. You did answered my question. Thanks for your time and advice. Have a nice weekend. Best regards, KJ Lim On 22 September 2012 03:49, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Lim, > > First of all it all depends on what you have for gene identifiers. If you > are like most people you will have entrez gene IDs. So for now I will > assume you have those. > > ## So lets further assume you are working with humans and just choose the > 1st two entrez gene IDs so that we can make a (hopefully meaningful) example > ids = c("1","2") > ## now load the org library for humans > library(org.Hs.eg.db) > ## then you can call select to extract your GO IDs like this: > select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > > Now one thing to notice is that if you have some other kind of identifier, > then your keytype argument will have to be set to a different value. And > hopefully, the kind of ID you are using, is present in the package that you > have to search... See the manual page for select for more information. > > ?select > > From your question, I also recognize that you may not be able to do this > because it sounds like you might be using a more unusual organism and not > be using something commonplace like human. Well don't give up just yet, > because we may be able to help you there too. You can look at the manual > page for the function makeOrgPackageFromNCBI to learn how you can try to > generate an org package from just the taxonomy ID (which you can look up on > NCBIs website). If the data is available at NCBI, then you should be able > to generate a package from NCBI that will match your organism of choice. > > ?makeOrgPackageFromNCBI > > > Does that answer your question? > > > Marc > > > > > On 09/21/2012 02:35 AM, KJ Lim wrote: > >> Dear Bioconductor community, >> >> Good day. >> >> I did the differential expression analysis for my RNA-Seq data with edgeR >> package. I have a list of differentially expressed genes now and I would >> like to find the GO terms for the genes. >> >> I have been reading and searching around for the right package. But, I >> found that several packages are developed based on model species. Could >> the >> community kindly please suggest me what GO annotation package I can use >> for >> non-model species; plant RNA-Seq data? >> >> Thank you very much and have a nice weekend. >> >> Best regards, >> KJ Lim >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago KJ Lim ▴ 420

0

Entering edit mode

Hi, Unfortunately it's quite impossible to just make annotation packages for all species. Part of the problem is that not all species are equally well studied, so the information is just not always available. When it is available for the less popular species, it is usually learned by inference from another (hopefully not too distant) species. I made the function I recommended for you to try and help people in your position. I really hope it works for you. It will all just depend on whether or not there is data at NCBI etc. for your plant. Some species you can pull up will only have records for about 3 genes annotated, so it really all depends on what you are hoping to make a package for... As for entire books on ONLY this subject I am somewhat thankful that I don't really know of any off hand. But there are vignettes in this project that can help. I would start with this one: http://www.bioconductor.org/packages/2.11/bioc/vignettes/AnnotationDbi /inst/doc/IntroToAnnotationPackages.pdf We have recently made a series of changes to make things easier to use and most of those are described in the vignette above. The idea we had was that you should probably NOT have to read a whole book in order to make use of this stuff. ;) So I hope you find our changes to be helpful. Marc On 09/22/2012 12:04 AM, KJ Lim wrote: > Dear Marc, > > Thanks for your prompt replied. > > Your information is helpful. You are right about the organism I'm > working on. It is not a model species nor human samples, it is a > gymnosperm plant species. > > I will have a look on the ?makeOrgPackageFromNCBI package from the > manual, hopefully I have luck on this. It is a bit pity that many nice > GO packages are not yet ready for non-model species, please correct me > if I was wrong. > > You did answered my question. Thanks for your time and advice. Have a > nice weekend. > > Best regards, > KJ Lim > > > > On 22 September 2012 03:49, Marc Carlson <mcarlson@fhcrc.org> <mailto:mcarlson@fhcrc.org>> wrote: > > Hi Lim, > > First of all it all depends on what you have for gene identifiers. > If you are like most people you will have entrez gene IDs. So > for now I will assume you have those. > > ## So lets further assume you are working with humans and just > choose the 1st two entrez gene IDs so that we can make a > (hopefully meaningful) example > ids = c("1","2") > ## now load the org library for humans > library(org.Hs.eg.db) > ## then you can call select to extract your GO IDs like this: > select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > > Now one thing to notice is that if you have some other kind of > identifier, then your keytype argument will have to be set to a > different value. And hopefully, the kind of ID you are using, is > present in the package that you have to search... See the manual > page for select for more information. > > ?select > > From your question, I also recognize that you may not be able to > do this because it sounds like you might be using a more unusual > organism and not be using something commonplace like human. Well > don't give up just yet, because we may be able to help you there > too. You can look at the manual page for the function > makeOrgPackageFromNCBI to learn how you can try to generate an org > package from just the taxonomy ID (which you can look up on NCBIs > website). If the data is available at NCBI, then you should be > able to generate a package from NCBI that will match your organism > of choice. > > ?makeOrgPackageFromNCBI > > > Does that answer your question? > > > Marc > > > > > On 09/21/2012 02:35 AM, KJ Lim wrote: > > Dear Bioconductor community, > > Good day. > > I did the differential expression analysis for my RNA-Seq data > with edgeR > package. I have a list of differentially expressed genes now > and I would > like to find the GO terms for the genes. > > I have been reading and searching around for the right > package. But, I > found that several packages are developed based on model > species. Could the > community kindly please suggest me what GO annotation package > I can use for > non-model species; plant RNA-Seq data? > > Thank you very much and have a nice weekend. > > Best regards, > KJ Lim > > [[alternative HTML version deleted]] > > ______________________________ _________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/ listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics. conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > ______________________________ _________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/ listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics. conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Marc Carlson ★ 7.2k

Login before adding your answer.