Entering edit mode
Guido Hooiveld
★
4.1k
@guido-hooiveld-2020
Last seen 7 days ago
Wageningen University, Wageningen, the …
Hi,
Triggered by a recent comment of Herve on this list [stating that it
would be relatively easy to create your own org.xx.eg.db annotation
info using the function 'makeOrgPackageFromNCBI'], I decided to create
my own instance of the annotation library org.Ss.eg.db. Reason for
this is that after the latest BioC release in October 2011, NCBI has
made available a major update on annotation info for pig which I
already would like to make use of (to be precise, scrofa10.2 has been
released earlier this year http://www.ncbi.nlm.nih.gov/mapview/stats/B
uildStats.cgi?taxid=9823&build=4&ver=1).
However, by doing so some issues arose:
- my instance of the org.Ss.eg database is apparently incomplete; some
fields are dropped when creating the db, and I also noticed this when
comparing the content of the 'official' BioC-provided org.db with that
of mine (KEGG info seems to be lacking). Also an error is reported
when listing the content of my org.db (RefSeq 2 EG mappings are not
included). However, with respect to e.g. Gene Ontology mappings my
instance of the org.db seems to be more complete, since more genes do
have an GO mapping now (33506 out of 33506 vs 5730 out of 34804).
However, I don't fully trust this because of the before-mentioned
dropping of fields. More/complete output below.
- during the creation of the db, some GO terms are apparently too new.
Would it somehow be possible to also include these 'too new' terms in
the org.db?
Any feedback would be appreciated.
Thanks,
Guido
> library(AnnotationDbi)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
> makeOrgPackageFromNCBI(version = "0.1",
+ author = "Guido Hooiveld
<guido.hooiveld@wur.nl>",
+ maintainer = "Guido Hooiveld
<guido.hooiveld@wur.nl>",
+ outputDir = ".",
+ tax_id = "9823",
+ genus = "Sus",
+ species = "scrofaGH")
Loading required package: RSQLite
Loading required package: DBI
Loading required package: GO.db
Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
Populating gene2accession table:
table gene2accession filled
Getting data for gene2refseq.gz
Populating gene2refseq table:
table gene2refseq filled
Getting data for gene2unigene
Populating gene2unigene table:
table gene2unigene filled
Getting data for gene_info.gz
Populating gene_info table:
table gene_info filled
Getting data for gene2go.gz
Populating gene2go table:
Getting blast2GO data as a substitute for gene2go
table metadata filled
table map_metadata filled
table gene2go filled
table metadata filled
table map_metadata filled
Populating genes table:
genes table filled
Populating gene_info_temp table:
gene_info_temp table filled
Populating alias table:
alias table filled
Populating chromosomes table:
chromosomes table filled
Populating pubmed table:
pubmed table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating unigene table:
unigene table filled
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Populating go_bp table:
go_bp table filled
Populating go_mf table:
go_mf table filled
Populating go_cc table:
go_cc table filled
Populating go_bp_all table:
go_bp_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_cc_all table:
go_cc_all table filled
dropping table gene2pubmeddropping table gene2accessiondropping table
gene2refseqdropping table gene2unigenedropping table gene_infodropping
table gene2go
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.gene_name NOT NULL
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g
WHERE t._id=g._id AND t.chromosome NOT NULL
SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g
WHERE t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE
t._id=g._id AND t.alias_symbol NOT NULL
table map_counts filled
Creating package in ./org.SscrofaGH.eg.db
[1] TRUE
<<content of="" my="" instance="" of="" org.ss.eg.db="">>
> library(org. SscrofaGH.eg.db)
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
Loading required package: DBI
> org.SscrofaGH.eg.db
OrgDb object:
| BL2GOSOURCEDATE: Tue Feb 28 12:50:25 2012
| BL2GOSOURCENAME: blast2GO
| BL2GOSOURCEURL: http://www.blast2go.de/
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: ORGANISM_DB
| ORGANISM: Sus scrofaGH
| SPECIES: Sus ScrofaGH
| CENTRALID: EG
| TAXID: 9823
| EGSOURCEDATE: Tue Feb 28 12:50:27 2012
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| GOSOURCEDATE: 20110910
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata
| GOEGSOURCEDATE: Tue Feb 28 12:50:27 2012
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| Db type: OrgDb
| package: AnnotationDbi
> org. SscrofaGH.eg()
Quality control information for org. SscrofaGH.eg:
This package has the following mappings:
org.SscrofaGH.egALIAS2EG has 33506 mapped keys (of 33506 keys)
org.SscrofaGH.egCHR has 33506 mapped keys (of 33506 keys)
org.SscrofaGH.egGENENAME has 33506 mapped keys (of 33506 keys)
org.SscrofaGH.egGO has 33506 mapped keys (of 33506 keys)
org.SscrofaGH.egGO2ALLEGS has 33506 mapped keys (of 10755 keys)
org.SscrofaGH.egGO2EG has 33506 mapped keys (of 7256 keys)
org.SscrofaGH.egREFSEQ has 33506 mapped keys (of 33506 keys)
Error in get(mapname) : object 'org.SscrofaGH.egREFSEQ2EG' not found
<<content of="" original,="" bioc-provided="" org.ss.eg.db)=""> library(org.Ss.eg.db)
> org.Ss.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| package: AnnotationDbi
| DBSCHEMA: PIG_DB
| ORGANISM: Sus scrofa
| SPECIES: Pig
| EGSOURCEDATE: 2011-Sep14
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9823
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive
/latest-lite/
| GOSOURCEDATE: 20110910
| GOEGSOURCEDATE: 2011-Sep14
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| BL2GOSOURCENAME: blast2GO
| BL2GOSOURCEURL: http://www.blast2go.de/
| BL2GOSOURCEDATE: 2011-Mar2
> org.Ss.eg
Quality control information for org.Ss.eg:
This package has the following mappings:
org.Ss.egACCNUM has 24639 mapped keys (of 34084 keys)
org.Ss.egACCNUM2EG has 74012 mapped keys (of 74012 keys)
org.Ss.egALIAS2EG has 29916 mapped keys (of 29916 keys)
org.Ss.egCHR has 33656 mapped keys (of 34084 keys)
org.Ss.egENZYME has 1657 mapped keys (of 34084 keys)
org.Ss.egENZYME2EG has 818 mapped keys (of 818 keys)
org.Ss.egGENENAME has 34084 mapped keys (of 34084 keys)
org.Ss.egGO has 5730 mapped keys (of 34084 keys)
org.Ss.egGO2ALLEGS has 11689 mapped keys (of 11689 keys)
org.Ss.egGO2EG has 8215 mapped keys (of 8215 keys)
org.Ss.egPATH has 4458 mapped keys (of 34084 keys)
org.Ss.egPATH2EG has 225 mapped keys (of 225 keys)
org.Ss.egPMID has 10966 mapped keys (of 34084 keys)
org.Ss.egPMID2EG has 3938 mapped keys (of 3938 keys)
org.Ss.egREFSEQ has 24384 mapped keys (of 34084 keys)
org.Ss.egREFSEQ2EG has 53138 mapped keys (of 53138 keys)
org.Ss.egSYMBOL has 34084 mapped keys (of 34084 keys)
org.Ss.egSYMBOL2EG has 28138 mapped keys (of 28138 keys)
org.Ss.egUNIGENE has 8798 mapped keys (of 34084 keys)
org.Ss.egUNIGENE2EG has 8912 mapped keys (of 8912 keys)
org.Ss.egUNIPROT has 6660 mapped keys (of 34084 keys)
Additional Information about this package:
DB schema: PIG_DB
DB schema version: 2.1
Organism: Sus scrofa
Date for NCBI data: 2011-Sep14
Date for GO data: 20110910
Date for KEGG data: 2011-Mar15
> sessionInfo() <<session when="" creating="" org.db="">>
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.9-5 bitops_1.0-4.1 GO.db_2.6.1
[4] RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.16.11
[7] Biobase_2.14.0
loaded via a namespace (and not attached):
[1] IRanges_1.12.5 tools_2.14.0
>
> sessionInfo() <<session when="" comparing="" the="" 2="" org.dbs="">>
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Ss.eg.db_2.6.4 org.SscrofaGH.eg.db_1.0 RSQLite_0.11.1
[4] DBI_0.2-5 AnnotationDbi_1.16.11 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] IR
Gr, Guido
---------------------------------------------------------
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
email: guido.hooiveld@wur.nl
internet: http://nutrigene.4t.com
http://scholar.google.com/citations?user=qFHaMnoAAAAJ
http://www.researcherid.com/rid/F-4912-2010
[[alternative HTML version deleted]]