Gene Ontolology and Annotation
2
0
Entering edit mode
msp.cse • 0
@mspcse-7975
Last seen 8.9 years ago
United States

I find difficulties to get exact version of Gene Ontology and their annotations used in GO.db_3.1.2, org.Sc.sgd.db_3.1.2, and org.Hs.eg.db_3.1.2 from http://geneontology.org/.

So, please provide me the following three files used in mentioned packages:

gene_ontology.obo

gene_association.sgd

gene_association.goa_human

version • 1.9k views
ADD COMMENT
2
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States

Hi,

So James is right about the GO.db data being the place where the gene ontology data is stored.  If you are interested in the GO to gene mappings represented in the org.Sc.sgd.db and org.Hs.eg.db packages.  For the org.Hs.eg.db package, those data came from here:

> org.Hs.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2015-Mar17
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20150314
| GOEGSOURCEDATE: 2015-Mar17
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19
| GPSOURCEDATE: 2010-Mar22
| ENSOURCEDATE: 2015-Mar13
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Tue Mar 17 18:48:15 2015

 

So basically from this directory here:

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA

And more specifically, the gene to GO data for these 'eg' packages comes from this file:

File:gene2go.gz

 

And for yeast the data is also listed in the object like so:

> org.Sc.sgd.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: YEAST_DB
| ORGANISM: Saccharomyces cerevisiae
| SPECIES: Yeast
| YGSOURCENAME: Yeast Genome
| YGSOURCEURL: http://downloads.yeastgenome.org/
| YGSOURCEDATE: 14-Mar-2015
| CENTRALID: ORF
| TAXID: 559292
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20150314
| EGSOURCEDATE: 2015-Mar17
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| ENSOURCEDATE: 2015-Mar13
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Tue Mar 17 19:16:47 2015

 

So that GO to gene information is basically from here:

http://downloads.yeastgenome.org/

And even more specifically from this file here:

http://downloads.yeastgenome.org/curation/literature/gene_association.sgd.gz


Anyhow having said all of that the actual files that were used for previous incarnations of these databases are probably not online anymore as these things update all the time.  But I have  a copy of the originals that we used here if you really need them (although I am perplexed about what you would need them for).

So...  I hope this helps you,  :)


 Marc

ADD COMMENT
0
Entering edit mode

Thanks both of you.

But the directory maintains the updated versions. I need exatly the version which are used in GO.db_3.1.2, org.Sc.sgd.db_3.1.2, and org.Hs.eg.db_3.1.2.

 

ADD REPLY
0
Entering edit mode

Thank you Marc.

Please provide me the original copies of the three files: gene_ontology.obo, gene_association.sgd, gene_association.goa_human

Actually I need to compare some similarity measures (protein-protein interaction) using gene ontology. I am doing so by using GO.db_3.1.2, org.Sc.sgd.db_3.1.2, and org.Hs.eg.db_3.1.2 packages. Some implemented R packages also available like GOSemSim for some similarity measures and the current version of this packages (GOSemSim) also uses GO.db_3.1.2, org.Sc.sgd.db_3.1.2, and org.Hs.eg.db_3.1.2 packages. But there are some measures implemented (provided by authors) without using GO.db, org.Sc.sgd.db, and org.Hs.eg.db packages and use data directly by downloading from gene ontology website. So to use these implementations I need the same data set used in GO.db_3.1.2, org.Sc.sgd.db_3.1.2, and org.Hs.eg.db_3.1.2.

Waiting for your response...

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

Does this help?

> GO_dbInfo()
              name
1     GOSOURCENAME
2      GOSOURCEURL
3     GOSOURCEDATE
4          Db type
5          package
6         DBSCHEMA
7   GOEGSOURCEDATE
8   GOEGSOURCENAME
9    GOEGSOURCEURL
10 DBSCHEMAVERSION
                                                               value
1                                                      Gene Ontology
2  ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
3                                                           20150314
4                                                               GODb
5                                                      AnnotationDbi
6                                                              GO_DB
7                                                         2015-Mar17
8                                                        Entrez Gene
9                               ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10                                                               2.1

 

And please note that neither the org.Sc.sgd.db, nor org.Hs.eg.db have any data from GO. The data are extracted from the GO.db database, using SQL queries.

ADD COMMENT
0
Entering edit mode

Thanks James,

Can you tell me the SQL queries for the same?

ADD REPLY

Login before adding your answer.

Traffic: 586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6