I'm unable forge a custom genome with R 4.0.3 and BSgenome_1.58.0. I've successfully forged many custom genomes over the years, and I'm very confused about the error message.
forgeBSgenomeDataPkg("cws_analysis/BSGenome_NC_035897/BSgenome.Amexicanus.SIMR.astMexNC2-seed")
Error in .make_Seqinfo_from_genome(genome) :
"astMexNC" is not a registered NCBI assembly or UCSC genome (use
registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or
UCSC assemblies/genomes currently registered in the GenomeInfoDb package)
Here is the seed file:
Package: BSgenome.Amexicanus.SIMR.astMexNC
Title: Astyanax mexicanus (Cave Fish) first chromosome (NCBI version surface)
Description: Washington University School of Medicine chromosome-level assembly of Astyanax_Mexicanus-2.0.
Version: 1.4.2
organism: Astyanax mexicanus
common_name: Cavefish
provider: SIMR
genome: astMex2NC
release_date: April 2017
release_name: Surface chr1 sequence
source_url: https://webfs/cws_analysis/BSGenome_NC_035897
organism_biocview: Astyanax_mexicanus
BSgenomeObjname: Amexicanus
SrcDataFiles: NC_035897.fa from https://webfs/cws_analysis/BSGenome_NC_035897/
PkgExamples: genome$NC_035897.1 # same as genome[["NC_035897.1"]]
seqs_srcdir: /home/cws/jak11/cws_analysis/BSGenome_NC_035897
seqfile_name: NC_035897.2bit
I don't understand why I'm getting an error about registered assemblies. I can successfully forge this genome using R4.0.0 and BSgenome_1.56.0, by simply swapping out "genome: astMexNC" for "provider_version: astMex2NC" in the seed file, as I understand provider_version is now deprecated in favor of genome. But the seed file above fails with BSgenome_1.58.0.
What am I doing wrong? Why is it looking for registered assemblies?
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /n/apps/CentOS7/install/r-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-4.0.3/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets
[9] methods base
other attached packages:
[1] BSgenome_1.58.0 motifbreakR_2.4.0 MotifDb_1.32.0 Biostrings_2.58.0
[5] XVector_0.30.0 rtracklayer_1.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.4
[9] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0 pheatmap_1.0.12
[13] knitr_1.31 RColorBrewer_1.1-2
loaded via a namespace (and not attached):
[1] colorspace_2.0-0 ellipsis_0.3.1 biovizBase_1.38.0
[4] htmlTable_2.1.0 base64enc_0.1-3 dichromat_2.0-0
[7] rstudioapi_0.13 bit64_4.0.5 AnnotationDbi_1.52.0
[10] fansi_0.4.2 xml2_1.3.2 splines_4.0.0
[13] motifStack_1.34.0 cachem_1.0.4 ade4_1.7-16
[16] Formula_1.2-4 splitstackshape_1.4.8 Rsamtools_2.6.0
[19] cluster_2.1.1 dbplyr_2.1.0 png_0.1-7
[22] compiler_4.0.0 httr_1.4.2 backports_1.2.1
[25] assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0
[28] lazyeval_0.2.2 htmltools_0.5.1.1 prettyunits_1.1.1
[31] tools_4.0.0 TFMPvalue_0.0.8 gtable_0.3.0
[34] glue_1.4.2 GenomeInfoDbData_1.2.4 dplyr_1.0.5
[37] rappdirs_0.3.3 Rcpp_1.0.6 Biobase_2.50.0
[40] vctrs_0.3.6 debugme_1.1.0 xfun_0.22
[43] stringr_1.4.0 lifecycle_1.0.0 ensembldb_2.14.0
[46] XML_3.99-0.6 zlibbioc_1.36.0 MASS_7.3-53.1
[49] scales_1.1.1 VariantAnnotation_1.36.0 ProtGenerics_1.22.0
[52] hms_1.0.0 MatrixGenerics_1.2.1 SummarizedExperiment_1.20.0
[55] AnnotationFilter_1.14.0 yaml_2.2.1 curl_4.3
[58] memoise_2.0.0 gridExtra_2.3 ggplot2_3.3.3
[61] biomaRt_2.46.3 rpart_4.1-15 latticeExtra_0.6-29
[64] stringi_1.5.3 RSQLite_2.2.4 checkmate_2.0.0
[67] GenomicFeatures_1.42.2 BiocParallel_1.24.1 rlang_0.4.10
[70] pkgconfig_2.0.3 matrixStats_0.58.0 bitops_1.0-6
[73] evaluate_0.14 lattice_0.20-41 purrr_0.3.4
[76] GenomicAlignments_1.26.0 htmlwidgets_1.5.3 bit_4.0.4
[79] tidyselect_1.1.0 magrittr_2.0.1 R6_2.5.0
[82] generics_0.1.0 Hmisc_4.5-0 DelayedArray_0.16.2
[85] DBI_1.1.1 pillar_1.5.1 foreign_0.8-81
[88] survival_3.2-10 RCurl_1.98-1.3 nnet_7.3-15
[91] tibble_3.1.0 crayon_1.4.1 utf8_1.2.1
[94] BiocFileCache_1.14.0 rmarkdown_2.7 jpeg_0.1-8.1
[97] progress_1.2.2 data.table_1.14.0 blob_1.2.1
[100] digest_0.6.27 openssl_1.4.3 munsell_0.5.0
[103] Gviz_1.34.1 askpass_1.1
Thanks very much. This was a test genome with a single chromosome, so setting
circ_seqs: ""
solved it. So I guess this means circ_seqs is no longer an optional field (for custom genomes).Don't use
circ_seqs: ""
. If it worked it was just luck. See the updated vignette for what to use when there are no circular sequences.I looked at your link for the updated vignette, but I wasn't sure I understood it (unrendered). If there are no circular sequences, we should set is like so:
circ_seqs: character(0)
?From the updated vignette (unrendered):
So yes:
circ_seqs: character(0)
.