Does anyone have the command to download all of the core_nt using blast_db_get()
0
0
Entering edit mode
@matthew-thornton-5564
Last seen 1 day ago
USA, Los Angeles, USC

Hello!

I am gearing up to use rBLAST and I would like to download the core_nt BLAST database and keep it updated with BiocFileCache

I have the example from the vignette, but core_nt is a large number of tar.gz files from 00-68. The code to get them using wget does not work with 'blast_db_get'. What would be the proper code?

> tgz_file <- blast_db_get("core_nt.{00..68}.tar.gz")
Error in FUN(X[[i]], ...) : 
  invalid regular expression 'core_nt.{00..68}.tar.gz', reason 'Invalid contents of {}'
In addition: Warning message:
In FUN(X[[i]], ...) :
  TRE pattern compilation error 'Invalid contents of {}'

If it isn't possible to download them with 'blast_db_get' then is there a way to import the files into a the cache? I am asking this because I think that it has a super easy explanation. I will keep reading. Thank you.

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libopenblas_zenp-r0.3.18.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BiocFileCache_2.14.0 dbplyr_2.5.0         rBLAST_1.2.0        
 [4] Biostrings_2.74.1    GenomeInfoDb_1.42.3  XVector_0.46.0      
 [7] IRanges_2.40.1       S4Vectors_0.44.0     BiocGenerics_0.52.0 
[10] BiocManager_1.30.25 

loaded via a namespace (and not attached):
 [1] bit_4.5.0.1             jsonlite_1.9.0          dplyr_1.1.4            
 [4] compiler_4.4.2          crayon_1.5.3            filelock_1.0.3         
 [7] tidyselect_1.2.1        blob_1.2.4              fastmap_1.2.0          
[10] R6_2.6.1                generics_0.1.3          curl_6.2.1             
[13] tibble_3.2.1            GenomeInfoDbData_1.2.13 DBI_1.2.3              
[16] pillar_1.10.1           rlang_1.1.5             cachem_1.1.0           
[19] bit64_4.6.0-1           RSQLite_2.3.9           memoise_2.0.1          
[22] cli_3.6.4               withr_3.0.2             magrittr_2.0.3         
[25] zlibbioc_1.52.0         lifecycle_1.0.4         vctrs_0.6.5            
[28] glue_1.8.0              purrr_1.0.4             httr_1.4.7             
[31] tools_4.4.2             pkgconfig_2.0.3         UCSC.utils_1.2.0
rBLAST BiocFileCache • 255 views
ADD COMMENT
0
Entering edit mode

Ok. So I guess the core_nt is over 300Gb compressed. I can do it, but maybe it is better to just use ncbi-tools over the internet and just loop through queries.

ADD REPLY

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6