Error: no such table: main.gene2pubmed when running makeOrgPackageFromNCBI() from AnnotationForge
1
0
Entering edit mode
Ozan • 0
@a1a2fcd4
Last seen 5 months ago
United States

Hello, I am having issues with running the makeOrgPackageFromNCBI() function. Since there is no available OrgDB available for Candida albicans, I am trying to download the Candida albicans available in NCBI with tax_id 237561

library(AnnotationForge)
library(biomaRt)
makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some One <some@one.org>",
                       maintainer = "Some One <some@one.org>",
                       outputDir = ".",
                       tax_id = "237561",
                       genus = "Candida",
                       species = "albicans",
                       rebuildCache = FALSE)

preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz

Error: no such table: main.gene2pubmed

Backtrace:

  1. AnnotationForge::makeOrgPackageFromNCBI(...)
  2. AnnotationForge:::NEW_makeOrgPackageFromNCBI(...)
  3. AnnotationForge:::prepareDataFromNCBI(...)
  4. AnnotationForge:::.makeBaseDBFromDLs(...)
  5. AnnotationForge:::.downloadData(...) ...
  6. DBI::dbExecute(NCBIcon, sql)
    1. DBI::dbSendStatement(conn, statement, ...)
    2. RSQLite::dbSendQuery(conn, statement, ...)
    3. RSQLite (local) .local(conn, statement, ...)
    4. RSQLite:::result_create(conn@ptr, statement)
OrganismDb AnnotationForge • 1.6k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 minutes ago
United States

Making an OrgDb package is a two-step process. First, all of the data from NCBI is parsed and put into an omnibus SQLite database called 'NCBI.sqlite', and then queries are made on that db to get the subset of data for the OrgDb. I have an NCBI.sqlite hanging around that I use for answering these sorts of questions.

> library(RSQLite)
> con <- dbConnect(SQLite(), "NCBI.sqlite")

## first try your Taxonomic ID
> dbGetQuery(con, "select count(*) from gene2pubmed where tax_id='237561';")
  count(*)
1        0
> dbGetQuery(con, "select count(*) from gene2refseq where tax_id='237561';")
  count(*)
1     6310
> dbGetQuery(con, "select count(*) from gene2accession where tax_id='237561';")
  count(*)
1        0

## that seems suboptimal. The 'main' ID is 5476. Try that one
> dbGetQuery(con, "select count(*) from gene2pubmed where tax_id='5476';")
  count(*)
1       14
> dbGetQuery(con, "select count(*) from gene2refseq where tax_id='5476';")
  count(*)
1       40
> dbGetQuery(con, "select count(*) from gene2accession where tax_id='5476';")
  count(*)
1       89

## Still not great

There appear to be quite a few taxonomic IDs for all the various strains of C. albicans, so you might need to iterate through until you find the one with the fullest annotation set. You should have your own NCBI.sqlite db that you can use for that.

0
Entering edit mode

Thanks for your comment! I am struggling to reproduce this code on my side, and don't know what I am doing wrong. When I run the code, I keep getting the "Error: no such table: gene2pubmed" error. Also, I am interested in running this analysis on SC5314 Candida albicans strains, and looking at the taxonomic ID's available, I have only found one or two taxonomic ID's, so I am curious as to where you saw all the tax_id's you mentioned.

ADD REPLY
2
Entering edit mode

I went to the source.

Also, what code are you talking about? The makeOrgPackageFromNCBI that you originally posted? If so, my previous post was meant to explain to you why it isn't working (and won't ever work) for you - if there are no annotations for your species at NCBI, you cannot make an OrgDb package using makeOrgPackageFromNCBI, because, well, there are no data at NCBI with which to do so.

I mean, in time there may be some annotations on the species you care about, but the way NCBI works is that people identify things they think are interesting and then submit. If many people are working on your strain (or even one dedicated person/lab), then NCBI may end up with a bunch of strain-specific information that they will populate their databases with, but until that happens you won't have any data. Unless you are willing to consider that the genes in the 'main' strain are the same/good enough for your purposes (although there are hardly any for that strain as it is).

However! You might be able to do an end-around, depending on what you are really trying to do (simply having an OrgDb is not likely your end goal). UniProt appears to have lots of data for this strain, so you could either A.) use the UniProt.ws package to get whatever data you want and use makeOrgPackage to make an OrgDb, or B.) just use the UniProt.ws package directly to do whatever annotations you are trying to do.

ADD REPLY
0
Entering edit mode

That's completely fair. I meant to say that I am unable to reproduce the code block you've provided above. I try running this on my system, but I get an error when I get to the dbGetQuery function. The UniProt.ws approach is very interesting. I have briefly skimmed through and tried the package, and there seems to a bunch of options available for my Candida strain. In the end, I am aiming to run an over-representation analysis on an enriched gene set from my data. I am unsure what you are referring to when you say I can "directly use the package to do whatever annotations I am trying to do" so I'd really appreciate some guidance on that point. Thank you for your time by the way, you have no idea how helpful all this really is!

ADD REPLY
0
Entering edit mode

If the error is that the function can't be found, you need to either load (or install and then load) DBI.

ADD REPLY
0
Entering edit mode

When I run dbGetQuery, it says: "Error: no such table: gene2pubmed", so I don't think it is unable to find the function. I thought maybe my data is incomplete, but at the very least I was hoping to get the same output as you.

ADD REPLY
0
Entering edit mode

What does dbListTables(con) produce? This is on the NCBI.sqlite db, right?

ADD REPLY
0
Entering edit mode

I run the con <- dbConnect(SQLite(), "NCBI.sqlite") line, but the dbListTables(con) yields character(0) so guessing this list is not populated when I try dbConnect

ADD REPLY
0
Entering edit mode

You have to do that in the directory that contains your NCBI.sqlite database.

> library(RSQLite)
> con <- dbConnect(SQLite(), "thisfiledoesnotexist.sqlite")
> dbListTables(con)
character(0)
> setwd("c:/Users/jmacdon/Desktop/")
> con2 <- dbConnect(SQLite(), "NCBI.sqlite")
> dbListTables(con2)
 [1] "altGO"              
 [2] "altGO_date"         
 [3] "gene2accession"     
 [4] "gene2accession_date"
 [5] "gene2go"            
 [6] "gene2go_date"       
 [7] "gene2pubmed"        
 [8] "gene2pubmed_date"   
 [9] "gene2refseq"        
[10] "gene2refseq_date"   
[11] "gene_info"          
[12] "gene_info_date"     

> dir(".", "sqlite$")
[1] "GEOmetadb.sqlite"       
[2] "NCBI.sqlite"            
[3] "org.Calbicans.eg.sqlite"
ADD REPLY
0
Entering edit mode

I believe my NCBI.sqlite database is located in the same directory, but for some reason it seems to be empty. Even in the File explorer it shows the file as having 0B size.

ADD REPLY
1
Entering edit mode

Ah, ok. Doesn't really matter though. Try using UniProt.ws.

ADD REPLY
0
Entering edit mode

Sounds good. Thanks again!

ADD REPLY
0
Entering edit mode

You can use UniProt.ws to annotate things. Say you have the Candida IDs and you want to know what the gene symbol is or whatever.

You can use UniProt.ws to download the GO table as well as other annotations and then make an OrgDb using makeOrgPackage.

ADD REPLY
0
Entering edit mode

Okay, this is a great explanation, thank you. I will use this and share results once I get some.

ADD REPLY

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6