Dear Gabe,
thanks a lot for making the genbankr
package available. Today, I tried to parse a genbank entry for a synthetic DNA molecule, e.g. KR709867.1.
Importing this file by accession failed:
id = GBAccession("KR709867.1") readGenBank(id) Error in .normargIsCircular(isCircular, seqnames) : length of supplied 'isCircular' must equal the number of sequences
I traced the error to the make_gbrecord function, which raises the error in the following line:
sqinfo = Seqinfo(seqlevels(srcs), width(srcs), circ, genom)
because the srcs GRanges object contains 2 ranges:
Ranges object with 2 ranges and 9 metadata columns:
seqnames ranges strand | type organism
<Rle> <IRanges> <Rle> | <character> <character>
[1] synthetic construct:1 [ 1, 1311] + | source synthetic construct
[2] Homo sapiens:2 [66, 1244] + | source Homo sapiens
mol_type db_xref clone focus
<character> <CharacterList> <character> <logical>
[1] other DNA taxon:32630 CCSBHm_00007040 TRUE
[2] other DNA taxon:9606 <NA> FALSE
note loctype
<character> <character>
[1] vector:pDONR223; derived from parent clone GenBankaccession: KJ897694 normal
[2] <NA> normal
temp_grouping_id
<integer>
[1] 1
[2] 2
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
Are you intending the genbankr package to support synthetic constructs (plasmids, clones, etc)? If so, maybe you want to take a look at this example.
Thanks,
Thomas
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.1 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genbankr_1.2.0 BiocInstaller_1.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.8 AnnotationDbi_1.36.0 XVector_0.14.0 [4] GenomicAlignments_1.10.0 GenomicRanges_1.26.1 BiocGenerics_0.20.0 [7] zlibbioc_1.20.0 IRanges_2.8.1 BiocParallel_1.8.1 [10] BSgenome_1.42.0 lattice_0.20-34 R6_2.2.0 [13] httr_1.2.1 rentrez_1.0.4 GenomeInfoDb_1.10.1 [16] tools_3.3.2 grid_3.3.2 SummarizedExperiment_1.4.0 [19] parallel_3.3.2 Biobase_2.34.0 DBI_0.5-1 [22] digest_0.6.10 Matrix_1.2-7.1 rtracklayer_1.34.1 [25] S4Vectors_0.12.1 bitops_1.0-6 curl_2.3 [28] RCurl_1.95-4.8 biomaRt_2.30.0 memoise_1.0.0 [31] RSQLite_1.1 GenomicFeatures_1.26.0 Biostrings_2.42.1 [34] Rsamtools_1.26.1 stats4_3.3.2 XML_3.98-1.5 [37] jsonlite_1.1 VariantAnnotation_1.20.2