Entering edit mode
Hello,
I have an in-house assembly for a non-model organism. I am planning to forge a BSgenome data package. I have 2 questions regarding that:
- As the assembly is not yet published it is not in any website, I don't have "source_url " for the description file.. Is there any workaround for that?
- I couldn't really understand whether the built data package will be publicly available (for instance, pops up when available.genomes() is prompt) or will it be local?
Thanks a lot in advance!
Best,
Aybuge
Hi James,
Thanks a lot for your reply! I gave it a try with a fake
source_url
and it seems to be fine. However, I am having another problem which I believe to originate from "provider" filed of the Description file. The error is as below:As I mentioned, the data is not from any of the conventional providers (like UCSC or NCBI) but in-house. Also, the .fasta file is organized by scaffolds but as per chromosomes. Could you please help me with that too?
Thanks,
Aybuge
Hi,
Also NA values don't qualify as "single strings" so it could be that this is what the error message is trying to tell you, admittedly in some sort of cryptic way.
Anyway I'm surprised that you would get an error about missing
PROVIDERVERSION
orRELEASENAME
. These fields have been removed (or renamed) for a while and are no longer supported. What version of BSgenome are you using? The latest release version is 1.58.0. Note that BSgenome 1.58.0 belongs to Bioconductor 3.12 which requires R 4.0.Cheers,
H.
Hi Hervé,
Thanks a lot for your reply! Indeed, I was using BSgenome 1.56.0 and as the Bioconductor vignette I was following is based on BSgenome 1.58.0, it did not include
PROVIDERVERSION
orRELEASENAME
- so, I happened to give them as NA, leading to "single strings" error.After updating the BSgenome to 1.58.0 version,
forgeBSgenomeDataPkg
worked fine!Just a small suggestion: it might worth explicitly including
circ_seqs: character(0)
line for the 2bit file example in the vignette.I am aware that the topic of the question is being skewed but I am now having a problem when building the package - specifically at the
R CMD check <tarball>
step:I have converted my .fasta file to 2bit using faToTwoBit from UCSC faToTwoBit before creating the seed file, so I think it should be in a correct format. I am not sure whether it is due to my system. Any suggestion is much appreciated!
Best,
Aybuge
I've never used
faToTwoBit()
so don't know what could have gone wrong. FWIW I prefer to useBiostrings::readDNAStringSet()
to load the FASTA file in R and then write the sequences back to disk in 2bit format. This allows more control like reordering the sequences. See this discussion for the topic of converting your FASTA file to the 2bit format, including pointers to scripts located in the BSgenome package that do this type of conversion.If you've further question about this, please ask the question as a new topic. Thanks!
H.
The error says what the error is! The values you have for the provider and provider_version are not single strings. If you are unsure what a 'single string' is, here are <del>two</del> some examples:
Hi James,
Of course, I have checked all the seed file fields via
isSingleString()
before posting the question here, but I wasn't aware that NA values don't qualify as "single strings". The issue was related to my BSgenome version (see the comment below) and I solved it with Hervé's reply.Thanks anyways!