Help creating BSgenome object for new canine and feline genomes?
2
0
Entering edit mode
Kate • 0
@9deda875
Last seen 8 months ago
United States

Hello,

I would like to create a new BSgenome object for these two genome builds:

  1. Canine genome: canfam4, also called UU_Cfam_GSD_1.0
  2. Feline genome: fca126, also called F.catus_Fca126_mat1.0

Would someone be able to help me with this?

Best, Kate

BSgenome • 1.2k views
ADD COMMENT
1
Entering edit mode
shepherl 4.1k
@lshep
Last seen 1 day ago
United States

Have you looked at BSgenomeForge for creating new?

ADD COMMENT
0
Entering edit mode

Thank you! I tried following the instructions for forgeBSgenomeDataPkgFromNCBI and ran into two issues - first, it seemed to time out without downloading the full fasta file.

    forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_018350175.1",pkg_maintainer="myname <myemail>",destdir="./")

> Error in download.file(file_url, destfile, method, quiet) :   
> download from
> 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/350/175/GCF_018350175.1_F.catus_Fca126_mat1.0/GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz'
> failed In addition: Warning messages: 1: In download.file(file_url,
> destfile, method, quiet) :   downloaded length 0 != reported length 0
> 2: In download.file(file_url, destfile, method, quiet) :   URL
> 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/350/175/GCF_018350175.1_F.catus_Fca126_mat1.0/GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz':
> Timeout of 60 seconds was reached

To get around this, I instead downloaded the fasta myself using wget:

    wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/350/175/GCF_018350175.1_F.catus_Fca126_mat1.0/GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz

Which worked:

> --2024-02-22 20:17:15--  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/350/175/GCF_018350175.1_F.catus_Fca126_mat1.0/GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz
> Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)...
> 130.14.250.13, 130.14.250.12, 2607:f220:41e:250::10, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:443...
> connected. HTTP request sent, awaiting response... 200 OK Length:
> 768361105 (733M) [application/x-gzip] Saving to:
> 'GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz'
> 
> GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz  
> 100%[==========================================================================================================================>] 732.77M  4.28MB/s    in 6m 38s  
> 
> 2024-02-22 20:23:54 (1.84 MB/s) -
> 'GCF_018350175.1_F.catus_Fca126_mat1.0_genomic.fna.gz' saved
> [768361105/768361105]

Then I retried the initial command:

    forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_018350175.1",pkg_maintainer="myname <myemail>",destdir="./")

Now it creates the package directory, but is not able to copy the "single_sequences.2bit" file, as it can't find it:

> Creating package in ./BSgenome.Fcatus.NCBI.F.catusFca126mat1.0 
> existing ./BSgenome.Fcatus.NCBI.F.catusFca126mat1.0 was removed.
> Warning message: In file.rename(filepath, to) :   cannot rename file
> '/local/scratch/42331553.1.interactive/Rtmpy75ESJ/single_sequences.2bit'
> to
> './BSgenome.Fcatus.NCBI.F.catusFca126mat1.0/inst/extdata/single_sequences.2bit',
> reason 'No such file or directory'

Any suggestions on how to fix this?

Best, Kate

ADD REPLY
0
Entering edit mode

You can adjust the default time out with options. example options(timeout=10000)

ADD REPLY
0
Entering edit mode

Thank you, that was helpful! I was able to create and install the feline genome package. However, the tool that I am using to process my data is in R version 3.5, and I cannot seem to load the BSgenome library created with a newer version of R.

library("BSgenome.Fcatus.NCBI.F.catusFca126mat1.0")
Error in library("BSgenome.Fcatus.NCBI.F.catusFca126mat1.0") :
  there is no package called 'BSgenome.Fcatus.NCBI.F.catusFca126mat1.0'
> library(devtools)
> load_all("BSgenome.Fcatus.NCBI.F.catusFca126mat1.0")
Loading BSgenome.Fcatus.NCBI.F.catusFca126mat1.0
Error in BSgenome(organism = "Felis catus", common_name = NA, genome = "F.catus_Fca126_mat1.0",  :
  unused argument (genome = "F.catus_Fca126_mat1.0")
In addition: Warning messages:
1: In (function (dep_name, dep_ver = NA, dep_compare = NA)  :
  Need GenomeInfoDb >= 1.34.9 but loaded version is 1.16.0
2: In (function (dep_name, dep_ver = NA, dep_compare = NA)  :
  Need BSgenome >= 1.66.1 but loaded version is 1.48.0

I tried re-installing the package in R 3.5, but got:

withr::with_libpaths("./", install_local("BSgenome.Fcatus.NCBI.F.catusFca126mat1.0", force=TRUE))
ERROR: this R is version 3.5.0, package  'BSgenome.Fcatus.NCBI.F.catusFca126mat1.0' requires R >= 4.2.0

I thought that maybe I could remake the BSgenome package in R 3.5, but when I try to install BSgenomeForge, I get:

package 'BSgenomeForge' is not available (for R version 3.5.0)

Is there a workaround for this?

Thank you again!

Best, Kate

ADD REPLY
0
Entering edit mode

The R version you are using is six(!) years old. You should upgrade to the current versions of R and Bioconductor first.

ADD REPLY
0
Entering edit mode
@guillaume-devailly-8722
Last seen 4 months ago
Toulouse, France

Here an (unrendered, sorry) .qmd file documenting how I did it some times ago. https://forgemia.inra.fr/genepi/analyses/rosepigs/-/blob/master/02_making_sus_scorfa_sensembl_bsgenome.qmd

I wrote it mostly for myself if I need to do it again, but it might be helpful?

ADD COMMENT

Login before adding your answer.

Traffic: 473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6