advantages of annotation packages

0

Entering edit mode

Rameswara Sashi Kiran Challa ▴ 30

@rameswara-sashi-kiran-challa-5931

Last seen 10.2 years ago

Hi All, Could anyone please elucidate advantages of having an Annotation package for an organism or point me to any documentation that clearly lists all the various thoughts behind coming up with an Annotation package. Will not having a data frame in R (with rows as genes and columns as various types of annotations like GO, KEGG, Unigene, etc) suffice? What are the advantages of having a AnnodbBimap objects and building a package? Are there any technical benefits like faster access of information? Thanks for your time, -Sashi [[alternative HTML version deleted]]

Annotation GO Organism Annotation GO Organism • 1.1k views

ADD COMMENT • link updated 11.6 years ago by Martin Morgan 25k • written 11.6 years ago by Rameswara Sashi Kiran Challa ▴ 30

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 4.2 years ago

United States

the memory overhead for those data.frames you speak of quickly becomes obscene when you start doing things like GO analyses On Fri, May 10, 2013 at 1:17 AM, Rameswara Sashi Kiran Challa < schalla@umail.iu.edu> wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. > > Will not having a data frame in R (with rows as genes and columns as > various types of annotations like GO, KEGG, Unigene, etc) suffice? What are > the advantages of having a AnnodbBimap objects and building a package? Are > there any technical benefits like faster access of information? > > Thanks for your time, > > -Sashi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 11.6 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 21 months ago

United States

Hi Sashi, On Fri, May 10, 2013 at 1:17 AM, Rameswara Sashi Kiran Challa <schalla at="" umail.iu.edu=""> wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. Read carefully: http://lmgtfy.com/?q=spreadsheet+vs+database HTH, -steve -- Steve Lianoglou Computational Biologist Department of Bioinformatics and Computational Biology Genentech

ADD COMMENT • link 11.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 4 months ago

United States

On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. > > Will not having a data frame in R (with rows as genes and columns as > various types of annotations like GO, KEGG, Unigene, etc) suffice? What are One aspect not mentioned is that one gets to exploit R's packaging system to provide easily distributed and documented versions of the data. Suppose you created the package eight months ago and have forgotten some of the detaiils. Easy, check out the package description and help page. Say you're working with a couple of colleagues, and you've been relatively disciplined about incrementing the annotation package when your data changes (or are using a public Bioc annotation package, with versions strictly tied to R / Bioc releases). Easily spot when unusual results are due to differences in data version (hence the frequent request for the output of 'sessionInfo()' on this mailing list) and adopt / instill 'best practices' that make sure everyone on the team (including yourself, even if your team is only 1) are using the same version. Martin > the advantages of having a AnnodbBimap objects and building a package? Are > there any technical benefits like faster access of information? > > Thanks for your time, > > -Sashi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 11.6 years ago Martin Morgan 25k

0

Entering edit mode

Just adding to what Martin already said, it's mostly about making your research more easily reproducible by using a consistent and traceable source for your information. This sort of thing is important for doing science, where other people will need to reproduce your results exactly. If all you had was your own personal data.frame, nobody else can really work with that unless you also make it available online etc. And then assuming you can serve it up somewhere in perpetuity, you also have to explain exactly how you made it etc. In short, when you went to write the methods section for your findings, you would end up making and maintaining your own annotation resource and thus reinventing the wheel. There are other advantages too. For example, many different kinds of annotation data are made into packages together, so you can know which version of GO was being used by a large group of people and also which entrez gene IDs were considered valid etc. So things are overall more standardized for a given version of bioconductor, which can aid in collaborations (since people are basically all working off the same data set). Marc On 05/10/2013 07:03 PM, Martin Morgan wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists >> all the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? >> What are > > One aspect not mentioned is that one gets to exploit R's packaging > system to provide easily distributed and documented versions of the > data. Suppose you created the package eight months ago and have > forgotten some of the detaiils. Easy, check out the package > description and help page. Say you're working with a couple of > colleagues, and you've been relatively disciplined about incrementing > the annotation package when your data changes (or are using a public > Bioc annotation package, with versions strictly tied to R / Bioc > releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of > 'sessionInfo()' on this mailing list) and adopt / instill 'best > practices' that make sure everyone on the team (including yourself, > even if your team is only 1) are using the same version. > > Martin > >> the advantages of having a AnnodbBimap objects and building a >> package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >

ADD REPLY • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

this is potentially a very important point however, the lack of easy install availability of previous versions of bioc packages works against it.... ~ malcolm_cook at stowers.org ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Marc Carlson [mcarlson@fhcrc.org] Sent: Monday, May 13, 2013 3:45 PM To: bioconductor at r-project.org Subject: Re: [BioC] advantages of annotation packages Just adding to what Martin already said, it's mostly about making your research more easily reproducible by using a consistent and traceable source for your information. This sort of thing is important for doing science, where other people will need to reproduce your results exactly. If all you had was your own personal data.frame, nobody else can really work with that unless you also make it available online etc. And then assuming you can serve it up somewhere in perpetuity, you also have to explain exactly how you made it etc. In short, when you went to write the methods section for your findings, you would end up making and maintaining your own annotation resource and thus reinventing the wheel. There are other advantages too. For example, many different kinds of annotation data are made into packages together, so you can know which version of GO was being used by a large group of people and also which entrez gene IDs were considered valid etc. So things are overall more standardized for a given version of bioconductor, which can aid in collaborations (since people are basically all working off the same data set). Marc On 05/10/2013 07:03 PM, Martin Morgan wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists >> all the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? >> What are > > One aspect not mentioned is that one gets to exploit R's packaging > system to provide easily distributed and documented versions of the > data. Suppose you created the package eight months ago and have > forgotten some of the detaiils. Easy, check out the package > description and help page. Say you're working with a couple of > colleagues, and you've been relatively disciplined about incrementing > the annotation package when your data changes (or are using a public > Bioc annotation package, with versions strictly tied to R / Bioc > releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of > 'sessionInfo()' on this mailing list) and adopt / instill 'best > practices' that make sure everyone on the team (including yourself, > even if your team is only 1) are using the same version. > > Martin > >> the advantages of having a AnnodbBimap objects and building a >> package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.6 years ago Malcolm Cook ★ 1.6k

0

Entering edit mode

On 05/14/2013 10:42 AM, Cook, Malcolm wrote: > this is potentially a very important point > > however, the lack of easy install availability of previous versions of bioc packages works against it.... I'm not understanding your comment. Bioc versions are released with specific R versions. Install the appropriate R version, and get the corresponding Bioc packages via biocLite(). Challenges occur when trying to install old R on new hardware (e.g., because the old R doesn't compile with new gcc or new libraries), but that's probably not what you mean? Martin > > ~ malcolm_cook at stowers.org > > ________________________________________ > From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Marc Carlson [mcarlson at fhcrc.org] > Sent: Monday, May 13, 2013 3:45 PM > To: bioconductor at r-project.org > Subject: Re: [BioC] advantages of annotation packages > > Just adding to what Martin already said, it's mostly about making your > research more easily reproducible by using a consistent and traceable > source for your information. This sort of thing is important for doing > science, where other people will need to reproduce your results exactly. > If all you had was your own personal data.frame, nobody else can really > work with that unless you also make it available online etc. And then > assuming you can serve it up somewhere in perpetuity, you also have to > explain exactly how you made it etc. In short, when you went to write > the methods section for your findings, you would end up making and > maintaining your own annotation resource and thus reinventing the wheel. > > There are other advantages too. For example, many different kinds of > annotation data are made into packages together, so you can know which > version of GO was being used by a large group of people and also which > entrez gene IDs were considered valid etc. So things are overall more > standardized for a given version of bioconductor, which can aid in > collaborations (since people are basically all working off the same data > set). > > > Marc > > > > On 05/10/2013 07:03 PM, Martin Morgan wrote: >> On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >>> Hi All, >>> >>> Could anyone please elucidate advantages of having an Annotation package >>> for an organism or point me to any documentation that clearly lists >>> all the >>> various thoughts behind coming up with an Annotation package. >>> >>> Will not having a data frame in R (with rows as genes and columns as >>> various types of annotations like GO, KEGG, Unigene, etc) suffice? >>> What are >> >> One aspect not mentioned is that one gets to exploit R's packaging >> system to provide easily distributed and documented versions of the >> data. Suppose you created the package eight months ago and have >> forgotten some of the detaiils. Easy, check out the package >> description and help page. Say you're working with a couple of >> colleagues, and you've been relatively disciplined about incrementing >> the annotation package when your data changes (or are using a public >> Bioc annotation package, with versions strictly tied to R / Bioc >> releases). Easily spot when unusual results are due to differences in >> data version (hence the frequent request for the output of >> 'sessionInfo()' on this mailing list) and adopt / instill 'best >> practices' that make sure everyone on the team (including yourself, >> even if your team is only 1) are using the same version. >> >> Martin >> >>> the advantages of having a AnnodbBimap objects and building a >>> package? Are >>> there any technical benefits like faster access of information? >>> >>> Thanks for your time, >>> >>> -Sashi >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 11.6 years ago Martin Morgan 25k

0

Entering edit mode

Thank you Martin, Tim and Steve!! For future readers, the introduction section of this document here<http: www.bioconductor.org="" packages="" 2.12="" bioc="" vignettes="" annotati="" onforge="" inst="" doc="" makingnewannotationpackages.pdf=""> (written by Marc Carlson) throws some light on the purpose of having select interfaces/annotation packages. -Sashi On Sat, May 11, 2013 at 7:33 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: > >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists all >> the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? What >> are >> > > One aspect not mentioned is that one gets to exploit R's packaging system > to provide easily distributed and documented versions of the data. Suppose > you created the package eight months ago and have forgotten some of the > detaiils. Easy, check out the package description and help page. Say you're > working with a couple of colleagues, and you've been relatively disciplined > about incrementing the annotation package when your data changes (or are > using a public Bioc annotation package, with versions strictly tied to R / > Bioc releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of 'sessionInfo()' > on this mailing list) and adopt / instill 'best practices' that make sure > everyone on the team (including yourself, even if your team is only 1) are > using the same version. > > Martin > > the advantages of having a AnnodbBimap objects and building a package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Rameswara Sashi Kiran Challa ▴ 30

Login before adding your answer.