Entering edit mode
Hello!
I am gearing up to use rBLAST and I would like to download the core_nt BLAST database and keep it updated with BiocFileCache
I have the example from the vignette, but core_nt is a large number of tar.gz files from 00-68. The code to get them using wget does not work with 'blast_db_get'. What would be the proper code?
> tgz_file <- blast_db_get("core_nt.{00..68}.tar.gz")
Error in FUN(X[[i]], ...) :
invalid regular expression 'core_nt.{00..68}.tar.gz', reason 'Invalid contents of {}'
In addition: Warning message:
In FUN(X[[i]], ...) :
TRE pattern compilation error 'Invalid contents of {}'
If it isn't possible to download them with 'blast_db_get' then is there a way to import the files into a the cache? I am asking this because I think that it has a super easy explanation. I will keep reading. Thank you.
> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/local/lib/libopenblas_zenp-r0.3.18.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BiocFileCache_2.14.0 dbplyr_2.5.0 rBLAST_1.2.0
[4] Biostrings_2.74.1 GenomeInfoDb_1.42.3 XVector_0.46.0
[7] IRanges_2.40.1 S4Vectors_0.44.0 BiocGenerics_0.52.0
[10] BiocManager_1.30.25
loaded via a namespace (and not attached):
[1] bit_4.5.0.1 jsonlite_1.9.0 dplyr_1.1.4
[4] compiler_4.4.2 crayon_1.5.3 filelock_1.0.3
[7] tidyselect_1.2.1 blob_1.2.4 fastmap_1.2.0
[10] R6_2.6.1 generics_0.1.3 curl_6.2.1
[13] tibble_3.2.1 GenomeInfoDbData_1.2.13 DBI_1.2.3
[16] pillar_1.10.1 rlang_1.1.5 cachem_1.1.0
[19] bit64_4.6.0-1 RSQLite_2.3.9 memoise_2.0.1
[22] cli_3.6.4 withr_3.0.2 magrittr_2.0.3
[25] zlibbioc_1.52.0 lifecycle_1.0.4 vctrs_0.6.5
[28] glue_1.8.0 purrr_1.0.4 httr_1.4.7
[31] tools_4.4.2 pkgconfig_2.0.3 UCSC.utils_1.2.0
Ok. So I guess the core_nt is over 300Gb compressed. I can do it, but maybe it is better to just use ncbi-tools over the internet and just loop through queries.