makeOrgPackage problems accessing RefSeq table
1
0
Entering edit mode
@fabian-grammes-6591
Last seen 4.3 years ago

Dear All

I generated a an OrgDb package for Salmo salar, using `AnnotationForge::makeOrgPackage()`. However, the sqlite table REFSEQ (containing RefSeq IDs) is missing when I load the package

libarary(org.Ssalar.eg.db)
columns(org.Ssalar.eg.db)
[1] "ACCESSION"    "CIGENE_ID"    "GENENAME"     "GENE_BIOTYPE" "GID"

But looking into the .db I can see the table is there

library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), org.Ssalar.eg_dbfile())
> dbListTables(con)
[1] "cigene"       "gene_biotype" "gene_info"    "genes"        "map_counts"  
[6] "map_metadata" "metadata"     "refseq"   

It would be nice if someone could explain why the REFSEQ table is not accessed from AnnotationDbi, i guess it's realted to AnootationDbi filtering the table out... But I would like to get this package at some point onto BioConductor and thus it would be nice to keep the table called REFSEQ.

cheers, F

 

 

 

 

 

 

annotationdbi • 1.1k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

I'm not sure you will ever need to get this package into Bioconductor, as there are already two such packages available via AnnotationHub.

> library(AnnotationHub)

> hub <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%
snapshotDate(): 2016-10-11
> query(hub, "Salmo salar")
AnnotationHub with 2 records
# snapshotDate(): 2016-10-11
# $dataprovider: NCBI, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Salmo salar
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH13851"]]'

            title                    
  AH13851 | org.Salmo_salar.eg.sqlite
  AH48629 | org.Salmo_salar.eg.sqlite

Also do note that the ACCESSIONS column should be doing a query on your refseq table. You are confusing the available columns that you can use with select or mapIds and the underlying SQL tables that will be queried. But all of the underlying SQL is generated based on the DB schema, so I can't say for sure, since you don't say exactly how you generated that package.

ADD COMMENT
0
Entering edit mode

Thanks for the response James. I haven't started using AnnotationHub but clearly I should look more into it.

1. Both packages available from AnnotationHub are not up to date since they do not contain the results from the Salmo salar genome / transcriptome (published this summer), so I guess in that sense my work is not redundant.

2. Yes I was confused, you are correct ACCESSION is querying the REFSEQ table

cheers, F

 

 

ADD REPLY
0
Entering edit mode

We are finalizing the new  OrgDb packages and they should be posted to AnnotationHub within the next week. These are the non-standard organisms - the packages for standard organisms in the Bioconductor repo are up to date:

http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb

Valerie

ADD REPLY

Login before adding your answer.

Traffic: 342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6