I run the command AnnotationForge from AnnotationForge package in r studio with the following inputs but I couldn't create my own custom database for Acinetobacter baumannii. I want to see gene set enrichment analysis from RNA-seq data with my mutated strain of Acinetobacter baumannii. This species is not available in r packages for GO analysis.
first I ran the command with rebuildCache=TRUE it started to download those repositories and then it got stuck into gene2accession file accessing. I mentioned the code below.
Then I have downloaded all the repositories from NCBI FTP site (https://ftp.ncbi.nih.gov/gene/DATA/ ) and supplied it to the working directory and run the following command but I am getting the following error. I even changed manually gene2accession.gz file's content's name from gene2accession to main.gene2accession but it also not worked. I mentioned this code also.
Please guide me. Thanks in advance.
makeOrgPackageFromNCBI(version = "0.1",
author = "Some one <somone2001@gmail.com>",
maintainer = "Some one <somone2001@gmail.com>",
outputDir = "/home/omic/analysis/R_studio",
NCBIFilesDir = "/home/omic/analysis/R_studio",
tax_id = "470",
genus = "Acinetobacter",
species = "baumannii",
rebuildCache=TRUE)
ERROR-
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
error reading from the connection
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
invalid or incomplete compressed data
makeOrgPackageFromNCBI(version = "0.1",
author = "Some one <somone2001@gmail.com>",
maintainer = "Some one <somone2001@gmail.com>",
outputDir = "/home/omic/analysis/R_studio",
NCBIFilesDir = "/home/omic/analysis/R_studio",
tax_id = "470",
genus = "Acinetobacter",
species = "baumannii",
rebuildCache=FALSE)
error -
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
Error: no such table: main.gene2accession
sessionInfo( )
Oh, I forgot. You need the idmapping file as well, from
https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz
I thank MacDonald sir for helping me out but now I'm getting the following error when I'm running the following commands with idmapping_selected.tab.gz and without it . Please tell me where is the problem.
Sorry, I don't think I was clear. You need to use
writeFilesToDb
for just the files you get from NCBI (not the idmapping_selected.tab.gz). So the regexp for finding those files is as I originally showed you, "^gene.+gz". Once you have generated the NCBI.sqlite file, you can then runmakeOrgDbFromNCBI
with rebuildCache = FALSE.To reiterate, you don't use the idmapping_selected.tab.gz directly at all. It's used internally by
makeOrgDbFromNCBI
to make the GO tables.Thanks MackDonald sir the process is now completed.