Question

Error in creating BSGenome package

0

Entering edit mode

zen • 0

@zen-22107

Last seen 4.4 years ago

Hello,

I am creating a BSGenome package for Solanum lycopersicum using this seed file and this package is very new to me:

Package: BSgenome.Slycopersicum.SGN.SL3.00
Title: Full genome sequences for Solanum lycopersicum (SGN version SL3.00)
Description: Full genome sequences for Solanum lycopersicum (tomato) as provided by SGN (v3.0, 2017) and stored in Solanum lycopersicum genome browser
Version: 3.00
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb. 2017
release_name: SL3.00
source_url: https://solgenomics.net/organism/Solanum_lycopersicum/genome/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
seqnames: paste("chr", c(1:12, "Un", paste(c(1:12, "Un"), "_random", sep=""))
seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/

But I am getting error:

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) : directory './BSgenome.Slycopersicum.SGN.SL3.00' exists; use unlink=TRUE to remove it, or choose another destination directory

Thank you for helping me!

BSGenome • 2.5k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 68k • written 5.4 years ago by zen • 0

score 2 · Answer 1 · 2019-10-18

Looking at the vignette for this package, I see how you might be confused. So, here's a step-by-step.

1.) Download the fasta file. As in, like, download it to your computer.

2.) If you downloaded the tar.gz file, do tar xvfz S_lycopersicum_chromosomes.3.00.fa.tar.gz. Otherwise get the full fasta to begin with.

3.) In R, after loading BSgenome do

> fasta.seqlengths("S_lycopersicum_chromosomes.3.00.fa")
 SL3.0ch00 SL3.0ch01  SL3.0ch02  SL3.0ch03  SL3.0ch04  SL3.0ch05  SL3.0ch06  
  20852292   98455869   55977580   72290146   66557038   66723567   49794276 
SL3.0ch07  SL3.0ch08  SL3.0ch09  SL3.0ch10  SL3.0ch11  SL3.0ch12  
  68175699   65987440   72906345   65633393   56597135   68126176

4.) Note that

> paste0("SL3.0ch", sprintf("%02d", 0:12))
 [1] "SL3.0ch00" "SL3.0ch01" "SL3.0ch02" "SL3.0ch03" "SL3.0ch04" "SL3.0ch05"
 [7] "SL3.0ch06" "SL3.0ch07" "SL3.0ch08" "SL3.0ch09" "SL3.0ch10" "SL3.0ch11"
[13] "SL3.0ch12"

generates the same chromosome names. This is important!

5.) For FASTA files, you need one FASTA file per chromosome (it says so in the vignette).

> z <- readDNAStringSet("S_lycopersicum_chromosomes.3.00.fa")
> dir.create("S_lycopersicum_chromosomes.3.00")
> for(i in 1:13) writeXStringSet(z[i,], paste0("S_lycopersicum_chromosomes.3.00/", gsub("\\s+", "", names(z)[i], perl = TRUE), ".fa"))
> dir("S_lycopersicum_chromosomes.3.00/")
 [1] "SL3.0ch00.fa" "SL3.0ch01.fa" "SL3.0ch02.fa" "SL3.0ch03.fa" "SL3.0ch04.fa"
 [6] "SL3.0ch05.fa" "SL3.0ch06.fa" "SL3.0ch07.fa" "SL3.0ch08.fa" "SL3.0ch09.fa"
[11] "SL3.0ch10.fa" "SL3.0ch11.fa" "SL3.0ch12.fa"

6.) Now you need a seed file. It should look like this:

Package: BSgenome.Slycopersicum.SGN.SL3
Title: Full genome sequences for Solanum lycopersicum (SGN version 3)
Description: Full genome sequences for Solanum lycopersicum as provided by SGN.
Version: 0.0.1
Suggests: GenomicFeatures
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb 2017
release_name: SL3.00
source_url: ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
SrcDataFiles: S_lycopersicum_chromosomes.3.00.fa from ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
seqs_srcdir: C:/Users/jmacdon/Desktop/S_lycopersicum_chromosomes.3.00
seqnames: paste0("SL3.0ch", sprintf("%02d", 0:12))

EDIT Note that the last line has the same R code that generates the chromosome names that I showed in step 4! In addition this is a text file that I saved on my computer as "Slycopersicum-seed".

ALSO NOTE THAT the seqs_srcdir has to point to the directory that you put your FASTA files in! Mine points to a dir on my computer, so don't use that.

7.) Build and install

> forgeBSgenomeDataPkg("Slycopersicum-seed")
Creating package in ./BSgenome.Slycopersicum.SGN.SL3 
Loading 'SL3.0ch00' sequence from FASTA file 'C:/Users/jmacdon/Desktop/S_lycopersicum_chromosomes.3.00/SL3.0ch00.fa' ... DONE
<snip>
Writing all sequences to './BSgenome.Slycopersicum.SGN.SL3/inst/extdata/single_sequences.2bit' ... DONE
> install.packages("BSgenome.Slycopersicum.SGN.SL3/", repos = NULL, type = "source") 
## I'm on Windows so I need to say 'source'
<snip>
* DONE (BSgenome.Slycopersicum.SGN.SL3)
> library(BSgenome.Slycopersicum.SGN.SL3)
> ls(2)
[1] "BSgenome.Slycopersicum.SGN.SL3" "Slycopersicum"                 
> Slycopersicum
Tomato genome:
# organism: Solanum lycopersicum (Tomato)
# provider: SGN
# provider version: SL3.00
# release date: Feb 2017
# release name: SL3.00
# 13 sequences:
#   SL3.0ch00 SL3.0ch01 SL3.0ch02 SL3.0ch03 SL3.0ch04 SL3.0ch05 SL3.0ch06
#   SL3.0ch07 SL3.0ch08 SL3.0ch09 SL3.0ch10 SL3.0ch11 SL3.0ch12          
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)

Et voila!

score 1 · Answer 2 · 2019-10-10

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 20 hours ago

United States

So there's three parts to that error message. The first part tells you what function had the error

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) :

And the second part explains what the problem is

directory './BSgenome.Slycopersicum.SGN.SL3.00' exists

And the third part gives you a couple of helpful suggestions

use unlink=TRUE to remove it, or choose another destination directory

The idea is that you would read that and it would be self-explanatory, and you would then make changes and go ahead with what you are doing. But evidently it wasn't self-explanatory? Can you say what was confusing, so perhaps we could improve?

ADD COMMENT • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you! I am not sure where does this directory exists. I checked the available genomes and it is not there.

ADD REPLY • link 5.4 years ago zen • 0

0

Entering edit mode

When you make a BSgenome package, you are generating everything required for the package installation in your working directory. Like an actual directory called BSgenome.Slycopersicum.SGN.SL3.00, that contains a bunch of subdirectories and whatnot. You can then install that package and use it. Presumably you have read the vignette?

What R is telling you is that you have already run forgeBSgenomeDataPkg, and you have generated the package, and you can now install. Which is also described in the vignette.

If you don't know where the directory exists, it's in your working directory! Or maybe you passed a different directory, using the destDir argument? Probably not, in which case you can use getwd to figure out what the current working directory is.

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Update: I have got this:

R CMD INSTALL BSgenome.Slycopersicum.SGN.SL3.00_3.00.tar.gz
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
* installing *source* package ‘BSgenome.Slycopersicum.SGN.SL3.00’ ...
** using staged installation
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package can be loaded from final location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package keeps a record of temporary installation path
* DONE (BSgenome.Slycopersicum.SGN.SL3.00)

When I tried to load it to R for plotKaryotype, I get:

Error in is(genome, "GRanges") : object 'Slycopersicum' not found

ADD REPLY • link 5.4 years ago zen • 0

0

Entering edit mode

Not sure what's going on exactly but the fact that you have

seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/

in your seed file does not look good. As explained in the vignette, the seqs_srcdir folder must be local:

So we assume that you've downloaded the sequence data files and that they are now located in a folder on your machine. From now on, we'll refer to this folder as the seqs_srcdir folder.

So I'm surprised that the forging step (i.e. forgeBSgenomeDataPkg("path/to/your/seed")) worked. Did it?

ADD REPLY • link 5.4 years ago Hervé Pagès 16k

0

Entering edit mode

Yes, forging did not give any error. But it still is not loading.

ADD REPLY • link 5.4 years ago zen • 0

0

Entering edit mode

Yes, forging did not give any error. But it still is not loading.

ADD REPLY • link 5.4 years ago zen • 0

0

Entering edit mode

This most likely means that you haven't loaded the package. OR it may be that the object is actually called Slycopersicum.SGN.SL3 or some such. You can tell by doing

library(BSgenome.Slycopersicum.SGN.SL3.00)
ls(2)

As an example

> library(BSgenome.Scerevisiae.UCSC.sacCer1)
> ls(2)
 [1] "BSgenome.Scerevisiae.UCSC.sacCer1" "Scerevisiae"

So I now know the nickname for this object is Scerevisiae

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you! But it does not give any nickname:

ls(2)
character(0)

I think it's not loaded properly.

ADD REPLY • link 5.4 years ago zen • 0

0

Entering edit mode

upvoting just because of how nice you wrote the comment. Other people need to do this more often :-)

ADD REPLY • link 5.4 years ago C T ▴ 140