I have error during package generation for sugar beet (and other organism also). How can I fix it. Please help!
makeOrgPackageFromNCBI(version = "0.1",
+ author = "G_M <g@gmail.com>",
+ maintainer = "G_M <g@gmail.com>",
+ outputDir = ".",
+ tax_id = "3555",
+ genus = "Beta",
+ species = "sugar_beet",
+ NCBIFilesDir=getwd(),
+ rebuildCache=FALSE)
preparing data from NCBI ...
starting download for 5 data files
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
Błąd w poleceniu '`[.data.frame`(data, setdiff(names(data), names(field_types)))': #Error
undefined columns selected
Dodatkowo: Komunikaty ostrzegawcze: ##additional wornings
1: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
2: call dbDisconnect() when finished working with a connection
3: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
4: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
> makeOrgPackageFromNCBI("0.0.1", "me@mine.org","me",".", "3555", "Beta","sugarbeet", rebuildCache = FALSE)
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which organisms can be annotated
with ensembl IDs.
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating chromosomes table:
chromosomes table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
'select()' returned many:1 mapping between keys and columns
Populating go_all table:
go_all table filled
Creating package in ./org.Bsugarbeet.eg.db
Now deleting temporary database file
complete!
[1] "org.Bsugarbeet.eg.sqlite"
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /data/oldR/R-3.4.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.4.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationForge_1.18.0 AnnotationDbi_1.38.1 IRanges_2.10.2
[4] S4Vectors_0.14.2 Biobase_2.36.2 BiocGenerics_0.22.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 GO.db_3.4.1 XML_3.98-1.7
[4] digest_0.6.12 bitops_1.0-6 GenomeInfoDb_1.12.1
[7] DBI_0.6-1 RSQLite_1.1-2 tools_3.4.0
[10] biomaRt_2.32.0 RCurl_1.95-4.8 compiler_3.4.0
[13] memoise_1.1.0 GenomeInfoDbData_0.99.0
>
I have no idea what is wrong with packages I use - everything is up to date... :(. If it works on your side, is it possible you could send me somehow results files for s. beet (tax_id = 3555) and also for carrot (tax_id =79200). I would be very grateful.
You are NOT using updated packages. You have an old version of R and Bioconductor. You need to update to R-3.4.1 and the current version of Bioconductor and try again.
OK. I was sure I have all up to date because I have installed it last week... and try to update and the info was that everything is up to date. But I used Conda so maybe that's the reason... Thanks for help anyway.
Hi, now I have new R version and bioconductor and still doesn't work.
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
Błąd w poleceniu '`[.data.frame`(data, setdiff(names(data), names(field_types)))':
undefined columns selected
Dodatkowo: Komunikaty ostrzegawcze:
1: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
2: call dbDisconnect() when finished working with a connection
3: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
4: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
and then re-run makeOrgPackageFromNCBI. If it downloads the gene_info.gz file again, you have to let it download everything (or you could pre-emptively just download the files from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ first), then stop the process, cut that last line out and then restart.
I'll push a fix soon, but it will take a day or so to propagate to the download server.
I have updated both the release and devel versions of AnnotationForge, and you should be able to get the updated package using biocLite in a day or two.
Dear James,
I have no idea what is wrong with packages I use - everything is up to date... :(. If it works on your side, is it possible you could send me somehow results files for s. beet (tax_id = 3555) and also for carrot (tax_id =79200). I would be very grateful.
Best,
Gabi
What's your sessionInfo()? If your packages are updated then you shouldn't be getting those warnings about dbFetch.
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationForge_1.16.1 AnnotationDbi_1.38.0 IRanges_2.8.2
[4] S4Vectors_0.12.2 Biobase_2.34.0 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 XML_3.98-1.9 digest_0.6.12 bitops_1.0-6 DBI_0.7
[6] RSQLite_2.0 rlang_0.1.1 blob_1.1.0 bit64_0.9-7 RCurl_1.95-4.8
[11] bit_1.1-12 memoise_1.1.0 tibble_1.3.3
You are NOT using updated packages. You have an old version of R and Bioconductor. You need to update to R-3.4.1 and the current version of Bioconductor and try again.
OK. I was sure I have all up to date because I have installed it last week... and try to update and the info was that everything is up to date. But I used Conda so maybe that's the reason... Thanks for help anyway.
Hi, now I have new R version and bioconductor and still doesn't work.
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
Błąd w poleceniu '`[.data.frame`(data, setdiff(names(data), names(field_types)))':
undefined columns selected
Dodatkowo: Komunikaty ostrzegawcze:
1: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
2: call dbDisconnect() when finished working with a connection
3: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
4: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /home/user/programy/R_3.4.1/R-3.4.1/lib/libRblas.so
LAPACK: /home/user/programy/R_3.4.1/R-3.4.1/lib/libRlapack.so
locale:
[1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationForge_1.18.0 AnnotationDbi_1.38.1 IRanges_2.10.2
[4] S4Vectors_0.14.3 Biobase_2.36.2 BiocGenerics_0.22.0
[7] BiocInstaller_1.26.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 XML_3.98-1.9 digest_0.6.12 bitops_1.0-6
[5] DBI_0.7 RSQLite_2.0 rlang_0.1.1 blob_1.1.0
[9] tools_3.4.1 bit64_0.9-7 RCurl_1.95-4.8 bit_1.1-12
[13] compiler_3.4.1 pkgconfig_2.0.1 memoise_1.1.0 tibble_1.3.3
OK, NCBI has added an extra column to the gene_info file that we need to do something with. The quick fix is to do this at a terminal prompt
and then re-run makeOrgPackageFromNCBI. If it downloads the gene_info.gz file again, you have to let it download everything (or you could pre-emptively just download the files from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ first), then stop the process, cut that last line out and then restart.
I'll push a fix soon, but it will take a day or so to propagate to the download server.
I have updated both the release and devel versions of AnnotationForge, and you should be able to get the updated package using
biocLite
in a day or two.