I am working with MEDIPS bioconductor. I was able to successfully create a custom BEE genome (I know its already there in MEDIPS, but for some reason I wanted to make a different one myself).
The genome folder consists of 16 separate fasta files for each chromosomes named like chr1, chr2 and so on.
The seed file is given below:
Package: BSgenome.Amellifera.UCSC.apiMe1 Title: Full genome sequence for Apis mellifera (for demo purpose) Description: Full genome sequences for Apis mellifera (Honey bee) as provided by UCSC (for demo purpose) and stored in Biostrings objects. Version: 1.0 organism: Apis mellifera common_name: Bee provider: UCSC provider_version: apiMe1 release_date: Nov. 2016 release_name: UCSC apiMe_1.0 source_url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/195/GCF_000002195.4_Amel_4.5/GCF_000002195.4_Amel_4.5_genomic.fna.gz organism_biocview: Apis_mellifera BSgenomeObjname: Amellifera seqnames: paste("chr",c(1:16), sep="") circ_seqs: "chr6" SrcDataFiles: coming from UCSC for demo purpose PkgExamples: genome$chr1 # same as genome[["chr1"]] seqs_srcdir: /home/bioinfo11.corp/Desktop/Important/Vijay_Lakhujani/New_explorations/MEDIPS/dataset_by_sir/genome_files
I ran forgeBSgenomeDataPkg("path_to_my_seed_file") and it worked creating a folder in the same directory. It contains following folders:
DESCRIPTION inst man NAMESPACE R
Now, when I run the command available.genomes(), I can't see my custom genome.
Is it expected or there is something wrong?
PS: I referred to below pdf and page#3 says , we can use a collection of compressed FASTA files (chrI.fa.gz,chrII.fa.gz,chrIII.fa.gz, ...,chrXXI.fa.gz,chrM.fa.gz andchrUn.fa.gz).
Interestingly, when I gzipped the files, it did not work and when I used the unzipped files, I was able to forge the genome. Could that be an issue?
Forging genome in MEDIPS tutorial :https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf