unable to forge custom BSgenome after update
1
0
Entering edit mode
Chris Seidel ▴ 80
@chris-seidel-5840
Last seen 3.5 years ago
United States

I'm unable forge a custom genome with R 4.0.3 and BSgenome_1.58.0. I've successfully forged many custom genomes over the years, and I'm very confused about the error message.

forgeBSgenomeDataPkg("cws_analysis/BSGenome_NC_035897/BSgenome.Amexicanus.SIMR.astMexNC2-seed")
Error in .make_Seqinfo_from_genome(genome) : 
  "astMexNC" is not a registered NCBI assembly or UCSC genome (use
  registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or
  UCSC assemblies/genomes currently registered in the GenomeInfoDb package)

Here is the seed file:

Package: BSgenome.Amexicanus.SIMR.astMexNC
Title: Astyanax mexicanus (Cave Fish) first chromosome (NCBI version surface)
Description: Washington University School of Medicine chromosome-level assembly of Astyanax_Mexicanus-2.0.
Version: 1.4.2
organism: Astyanax mexicanus
common_name: Cavefish
provider: SIMR
genome: astMex2NC
release_date: April 2017
release_name: Surface chr1 sequence
source_url: https://webfs/cws_analysis/BSGenome_NC_035897
organism_biocview: Astyanax_mexicanus
BSgenomeObjname: Amexicanus
SrcDataFiles: NC_035897.fa from https://webfs/cws_analysis/BSGenome_NC_035897/
PkgExamples: genome$NC_035897.1  # same as genome[["NC_035897.1"]]
seqs_srcdir: /home/cws/jak11/cws_analysis/BSGenome_NC_035897
seqfile_name: NC_035897.2bit

I don't understand why I'm getting an error about registered assemblies. I can successfully forge this genome using R4.0.0 and BSgenome_1.56.0, by simply swapping out "genome: astMexNC" for "provider_version: astMex2NC" in the seed file, as I understand provider_version is now deprecated in favor of genome. But the seed file above fails with BSgenome_1.58.0.

What am I doing wrong? Why is it looking for registered assemblies?

 > sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /n/apps/CentOS7/install/r-4.0.3/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] BSgenome_1.58.0      motifbreakR_2.4.0    MotifDb_1.32.0       Biostrings_2.58.0   
 [5] XVector_0.30.0       rtracklayer_1.50.0   GenomicRanges_1.42.0 GenomeInfoDb_1.26.4 
 [9] IRanges_2.24.1       S4Vectors_0.28.1     BiocGenerics_0.36.0  pheatmap_1.0.12     
[13] knitr_1.31           RColorBrewer_1.1-2  

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0            ellipsis_0.3.1              biovizBase_1.38.0          
  [4] htmlTable_2.1.0             base64enc_0.1-3             dichromat_2.0-0            
  [7] rstudioapi_0.13             bit64_4.0.5                 AnnotationDbi_1.52.0       
 [10] fansi_0.4.2                 xml2_1.3.2                  splines_4.0.0              
 [13] motifStack_1.34.0           cachem_1.0.4                ade4_1.7-16                
 [16] Formula_1.2-4               splitstackshape_1.4.8       Rsamtools_2.6.0            
 [19] cluster_2.1.1               dbplyr_2.1.0                png_0.1-7                  
 [22] compiler_4.0.0              httr_1.4.2                  backports_1.2.1            
 [25] assertthat_0.2.1            Matrix_1.3-2                fastmap_1.1.0              
 [28] lazyeval_0.2.2              htmltools_0.5.1.1           prettyunits_1.1.1          
 [31] tools_4.0.0                 TFMPvalue_0.0.8             gtable_0.3.0               
 [34] glue_1.4.2                  GenomeInfoDbData_1.2.4      dplyr_1.0.5                
 [37] rappdirs_0.3.3              Rcpp_1.0.6                  Biobase_2.50.0             
 [40] vctrs_0.3.6                 debugme_1.1.0               xfun_0.22                  
 [43] stringr_1.4.0               lifecycle_1.0.0             ensembldb_2.14.0           
 [46] XML_3.99-0.6                zlibbioc_1.36.0             MASS_7.3-53.1              
 [49] scales_1.1.1                VariantAnnotation_1.36.0    ProtGenerics_1.22.0        
 [52] hms_1.0.0                   MatrixGenerics_1.2.1        SummarizedExperiment_1.20.0
 [55] AnnotationFilter_1.14.0     yaml_2.2.1                  curl_4.3                   
 [58] memoise_2.0.0               gridExtra_2.3               ggplot2_3.3.3              
 [61] biomaRt_2.46.3              rpart_4.1-15                latticeExtra_0.6-29        
 [64] stringi_1.5.3               RSQLite_2.2.4               checkmate_2.0.0            
 [67] GenomicFeatures_1.42.2      BiocParallel_1.24.1         rlang_0.4.10               
 [70] pkgconfig_2.0.3             matrixStats_0.58.0          bitops_1.0-6               
 [73] evaluate_0.14               lattice_0.20-41             purrr_0.3.4                
 [76] GenomicAlignments_1.26.0    htmlwidgets_1.5.3           bit_4.0.4                  
 [79] tidyselect_1.1.0            magrittr_2.0.1              R6_2.5.0                   
 [82] generics_0.1.0              Hmisc_4.5-0                 DelayedArray_0.16.2        
 [85] DBI_1.1.1                   pillar_1.5.1                foreign_0.8-81             
 [88] survival_3.2-10             RCurl_1.98-1.3              nnet_7.3-15                
 [91] tibble_3.1.0                crayon_1.4.1                utf8_1.2.1                 
 [94] BiocFileCache_1.14.0        rmarkdown_2.7               jpeg_0.1-8.1               
 [97] progress_1.2.2              data.table_1.14.0           blob_1.2.1                 
[100] digest_0.6.27               openssl_1.4.3               munsell_0.5.0              
[103] Gviz_1.34.1                 askpass_1.1
BSgenome • 1.4k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 7 hours ago
Seattle, WA, United States

Hi,

You need to specify the circ_seqs field in your seed file.

forgeBSgenomeDataPkg() needs to know which sequences are circular and which are not. So if you don't specify the circ_seqs field it tries to get this information by calling GenomeInfoDb::Seqinfo(genome="astMex2NC"). However this only works for NCBI assemblies and UCSC genomes that are _registered_ in GenomeInfoDb.

See here for a similar discussion.

I've tried to clarify this in the BSgenomeForge.Rnw vignette in BSgenome 1.59.3 (devel).

Best,

H.

ADD COMMENT
0
Entering edit mode

Thanks very much. This was a test genome with a single chromosome, so setting circ_seqs: "" solved it. So I guess this means circ_seqs is no longer an optional field (for custom genomes).

ADD REPLY
0
Entering edit mode

Don't use circ_seqs: "". If it worked it was just luck. See the updated vignette for what to use when there are no circular sequences.

ADD REPLY
0
Entering edit mode

I looked at your link for the updated vignette, but I wasn't sure I understood it (unrendered). If there are no circular sequences, we should set is like so: circ_seqs: character(0) ?

ADD REPLY
1
Entering edit mode

From the updated vignette (unrendered):

If the assembly or genome has no circular sequence, set \code{circ_seqs} to \code{character(0)}.

So yes: circ_seqs: character(0).

ADD REPLY

Login before adding your answer.

Traffic: 650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6