Query regarding to create custom organism database with AnnotationForge package (AnnotationForge::makeOrgPackageFromNCBI)
2
0
Entering edit mode
abhisek001 • 0
@6d5973d2
Last seen 7 weeks ago
India

I run the command AnnotationForge from AnnotationForge package in r studio with the following inputs but I couldn't create my own custom database for Acinetobacter baumannii. I want to see gene set enrichment analysis from RNA-seq data with my mutated strain of Acinetobacter baumannii. This species is not available in r packages for GO analysis.

first I ran the command with rebuildCache=TRUE it started to download those repositories and then it got stuck into gene2accession file accessing. I mentioned the code below.

Then I have downloaded all the repositories from NCBI FTP site (https://ftp.ncbi.nih.gov/gene/DATA/ ) and supplied it to the working directory and run the following command but I am getting the following error. I even changed manually gene2accession.gz file's content's name from gene2accession to main.gene2accession but it also not worked. I mentioned this code also.

Please guide me. Thanks in advance.

makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some one <somone2001@gmail.com>",
                       maintainer = "Some one <somone2001@gmail.com>",
                       outputDir = "/home/omic/analysis/R_studio",
                       NCBIFilesDir = "/home/omic/analysis/R_studio",
                       tax_id = "470",
                       genus = "Acinetobacter",
                       species = "baumannii",
                       rebuildCache=TRUE)

ERROR- 

getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  error reading from the connection
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  invalid or incomplete compressed data


makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some one <somone2001@gmail.com>",
                       maintainer = "Some one <somone2001@gmail.com>",
                       outputDir = "/home/omic/analysis/R_studio",
                       NCBIFilesDir = "/home/omic/analysis/R_studio",
                       tax_id = "470",
                       genus = "Acinetobacter",
                       species = "baumannii",
                       rebuildCache=FALSE)



error - 

preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
Error: no such table: main.gene2accession
sessionInfo( )
AnnotationForge OrganismData OrganismDb • 1.5k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

I'm not sure about the first error - that might be due to a timeout, although you should have got a timeout error instead of a scan error. The second error has to do with the first though. When you ran the first attempt, a SQLite database was created, and in the second attempt (by using rebuildCache = FALSE) you are saying 'just use the existing SQLite database instead of re-downloading', in which case there are missing tables that result in the error you see.

Ideally there would be a facility to hand download the files from NCBI and then just tell AnnotationForge to use the files you downloaded. I recently tried to add that functionality but it wasn't quite right and I have backed it out until I come up with a better solution. In the interim you can just create the SQLite file using the following function, and then try your second method again.

writeFilesToDb <- function(file, file.dir = ".") {
    require("AnnotationForge", character.only = TRUE,  quietly = TRUE)
    require("RSQLite", character.only = TRUE, quietly = TRUE)
    tmp <- file.path(file.dir, file)
    pfiles <- AnnotationForge:::.primaryFiles()
    file <- pfiles[file]
    NCBIcon <- dbConnect(SQLite(), file.path(file.dir, "NCBI.sqlite"))
    tableName <- sub(".gz","",names(file))
    AnnotationForge:::.writeToNCBIDB(NCBIcon, tableName, filepath=tmp, file)
    AnnotationForge:::.setNCBIDateStamp(NCBIcon, tableName)
    dbDisconnect(NCBIcon)
}

## try it out
> fls <- dir(".", "^gene.+gz")
> fls
[1] "gene_info.gz"      "gene2accession.gz" "gene2go.gz"       
[4] "gene2pubmed.gz"    "gene2refseq.gz"
> for(i in fls) writeFilesToDb(i)
> makeOrgPackageFromNCBI("0.1","me <me@mine.org>", "me", ".", "470", "Acinetobacter","baumannii",rebuildCache = FALSE)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
<snip>
ADD COMMENT
0
Entering edit mode
abhisek001 • 0
@6d5973d2
Last seen 7 weeks ago
India

Thank you James W. MacDonald sir for your effort to counter the problem. I have followed your suggestion of a two-step process but still I not get the desired output. The development is following.

preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error: no such table: altGO_date
>
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

I thank MacDonald sir for helping me out but now I'm getting the following error when I'm running the following commands with idmapping_selected.tab.gz and without it . Please tell me where is the problem.

In the first scenario - 

> fls <- dir(".", "^gene.+gz")
> fls
[1] "gene_info.gz"      "gene2accession.gz" "gene2go.gz"       
[4] "gene2pubmed.gz"    "gene2refseq.gz"   
> for(i in fls) writeFilesToDb(i)
Warning messages:
1: In for (i in seq_len(n)) { :
  closing unused connection 3 (./idmapping_selected.tab.gz)
2: call dbDisconnect() when finished working with a connection

In the second case : 

> fls <- dir(".", "*.+gz")
> fls

[1] "gene_info.gz"              "gene2accession.gz"        
[3] "gene2go.gz"                "gene2pubmed.gz"           
[5] "gene2refseq.gz"            "idmapping_selected.tab.gz"

> for(i in fls) writeFilesToDb(i)

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'dbUnquoteIdentifier': Cannot pass NA to dbQuoteIdentifier()
Called from: h(simpleError(msg, call))
ADD REPLY
1
Entering edit mode

Sorry, I don't think I was clear. You need to use writeFilesToDb for just the files you get from NCBI (not the idmapping_selected.tab.gz). So the regexp for finding those files is as I originally showed you, "^gene.+gz". Once you have generated the NCBI.sqlite file, you can then run makeOrgDbFromNCBI with rebuildCache = FALSE.

To reiterate, you don't use the idmapping_selected.tab.gz directly at all. It's used internally by makeOrgDbFromNCBI to make the GO tables.

ADD REPLY
0
Entering edit mode

Thanks MackDonald sir the process is now completed.

Creating package in ./org.Abaumannii.eg.db 
Now deleting temporary database file
complete!
[1] "org.Abaumannii.eg.sqlite"
ADD REPLY

Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6