Can BSGenome forge from non-UCSC/NCBI assemblies?
1
0
Entering edit mode
@user-24291
Last seen 4.1 years ago

Hi!

Not sure if I've missed something basic, but is it possible to run forgeBSgenomeDataPkg on assemblies which aren't available on NCBI or UCSC? I'm getting a warning which suggests they aren't. Is this the case, or have I done something wrong in my seed file?

> library(BSgenome)
> forgeBSgenomeDataPkg("R://Sophie/Pea_RNASeq/genomes/BSGenomes_seed_PisumSativum.txt")
Error in .make_Seqinfo_from_genome(genome) : 
  "Pisum_sativum_v1a" is not a registered NCBI assembly or UCSC genome (use
  registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC
  assemblies/genomes currently registered in the GenomeInfoDb package)
In addition: Warning messages:
1: In readLines(infile, n = 25000L) :
  incomplete final line found on 'R://Sophie/Pea_RNASeq/genomes/BSGenomes_seed_PisumSativum.txt'
2: In forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,  :
  field 'release_name' is deprecated

The seed file reads:

Package: BSgenome.Psativum.URGI.Pisum_sativum_v1a
Title: Full genome sequence for Pisum sativum (URGI; v1a)
Description: Full genome sequence for Pisum sativum (URGI; v1a) see https://urgi.versailles.inra.fr/Species/Pisum/Pea-Genome-project
Version: 1a
organism: Pisum sativum
common_name: Pea
genome: Pisum_sativum_v1a
provider: URGI
provider_version: Pisum_sativum_v1a
release_date: Jan. 2019
release_name: Pisum sativum v1a
source_url: https://urgi.versailles.inra.fr/download/pea/
organism_biocview: Pisum_sativum
BSgenomeObjname: Psativum
SrcDataFiles: Split fasta file from https://urgi.versailles.inra.fr/download/pea/ (only Chr1-Chr7, no scaffolds)
seqs_srcdir: R://Sophie/Pea_RNASeq/genomes
seqnames: paste("chr",c(1:7))
BSgenome • 2.2k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States

Hi Sophie,

It's always possible to forge a BSgenome data package as long as you have access to the sequences (FASTA file(s) or 2bit file). However when the sequences are stored in a FASTA file it's highly recommended to write them to a 2bit file first, and then to use the 2bit file to forge the package. This process is also an opportunity to rename and/or reorder the sequences. Then in your seed file, you should not list the sequences (i.e. no seqnames entry) but you should make sure to list the circular sequences (circ_seqs entry). See Forging a BSGenome for an example and let me know here if you need further help with this.

Best,

H.

ADD COMMENT
0
Entering edit mode

I'm having the same problem - but this doesn't address the issue of NCBI/UCSC registered assemblies.

ADD REPLY
0
Entering edit mode

I understood what I was doing wrong, in the seed file need to specify:

circ_seqs: character(0)

and pass the 2bit file as seqfile:

seqs_srcdir: /path/to/2bit/directory
seqfile_name: file.2bit
ADD REPLY

Login before adding your answer.

Traffic: 512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6