ensemblVEP: "Ensembl VEP is not installed in your path."
1
1
Entering edit mode
@patrickturko-20153
Last seen 5.7 years ago

I'm installing the bioconductor wrapper to ensembl-vep in a docker container, using conda. Here is my dockerfile:

FROM centos:centos7.2.1511

RUN echo " - Installing development tools ..." \
    && yum install -y yum-plugin-ovl \
    && yum groupinstall -y "Development Tools"

# Install miniconda to /miniconda
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh
RUN bash Miniconda3-4.5.12-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda3-4.5.12-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}

RUN conda update -n base -c defaults conda

RUN conda config --add channels bioconda
RUN conda config --add channels conda-forge

RUN conda install ensembl-vep=94.5 bioconductor-ensemblvep r-base openssl=1.0
ENV PATH "/miniconda/share/ensembl-vep-94.5-0:$PATH"

RUN localedef -i en_US -f UTF-8 en_US.UTF8

Calling vep from the command line works. I can tell that the vep path is indeed appended to my $PATH. However, when I start R and load ensembl-vep library(ensemblVEP), I get the following output indicating that R has not found the vep script (4th line from the bottom):

> library(ensemblVEP)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: VariantAnnotation
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: 'matrixStats'

The following objects are masked from 'package:Biobase':

    anyMissing, rowMedians

Loading required package: BiocParallel

Attaching package: 'DelayedArray'

The following objects are masked from 'package:matrixStats':

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from 'package:base':

    aperm, apply

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:DelayedArray':

    type

The following object is masked from 'package:base':

    strsplit


Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

variant_effect_predictor.pl or vep script not found. Ensembl VEP is not installed in your path.

Attaching package: 'ensemblVEP'

The following object is masked from 'package:Biobase':

    cache

This is a bit weird, as when I call Sys.getenv("PATH"), R does in fact seem to have my vep path:

PATH           /miniconda/share/ensembl-vep-94.5-0:/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Any hints on what I should do to make the ensemblVEP R function pick up the vep function as installed on the host?

software error ensemblVEP • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi,

Is your goal here to use ensemblVEP as a package inside a Docker image? Or is it to build your own Docker image which uses bioconda to install ensembleVEP?

If it is the first one, then I suggest you use a different base image, (this is a temporary suggestion)

FROM bioconductor/bioconductor_full:RELEASE_3_8

RUN R -e "BiocManager::install('ensembleVEP')"

Please note that the image is not stable yet, bioconductor/bioconductorfull:RELEASE3_8 but if your goal is just to use it for one package, you should be fine.

If your goal is the second one, building your own image through miniconda installed ensemblVEP, can you try to just use bioconductor-ensemlvep without installing dependencies? The conda recipe should take care of it IMO.

RUN conda install -c bioconda bioconductor-ensemblvep

Best,

Nitesh

ADD REPLY
0
Entering edit mode

Hi Nitseh, thanks for the suggestions.

My goal is to use the ensemblVEP package inside of a docker container. I don't care how the container was built or what install system I use.

This package is unusual in that it relies on the ensembl-vep (note spelling difference) software to be installed on the host. And this software is unusual in that it has very specific perl dependencies. Hence my use of conda, which does in fact install the software correctly, although I have not yet found how to make it available to the ensembleVEP R package.

Neither of your suggestions above installs ensembl-vep at all (I confirmed this).

ADD REPLY
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

I looked at

> ensemblVEP:::.onAttach
function (libname, pkgname)
{
    msg <- paste0("variant_effect_predictor.pl or vep script not found. ",
        "Ensembl VEP is not installed in your path.")
    tryCatch(check <- system2("perl", .getVepPath(), stdout = TRUE,
        stderr = TRUE), error = function(e) packageStartupMessage(msg))
}

.onAttach() is run when the ensemblVEP package is attached to the search path (hmm, should probably be .onLoad()...). It looks like it's trying to run perl on .getVepPath() to validate that vep is installed. So I looked at

> ensemblVEP:::.getVepPath
function ()
{
    if (nchar(Sys.getenv("VEP_PATH")))
        return(Sys.getenv("VEP_PATH"))
    loc0 <- unname(Sys.which("variant_effect_predictor.pl"))
...

Stepping through this I find that the container does not have the system utility which, so this could be installed. Alternatively, it looks like I can set VEP_PATH, e.g,

[root@a78d5b8a19c0 /]# export VEP_PATH=/miniconda/share/ensembl-vep-94.5-0/vep
[root@a78d5b8a19c0 /]# R

I'm a docker novice, but I think that means adding the line

ENV VEP_PATH /miniconda/share/ensembl-vep-94.5-0/vep

to the Dockerfile.

It looks like the VEP_PATH environment variable is not document, and that the intended solution is to add which to the image; this should be reported to bioconda so that they can update their recipe.

ADD COMMENT
0
Entering edit mode

For what it's worth, I also explored the idea of using 'bioconductor_full' as a starting point. I ended up with a Dockerfile

FROM bioconductor/bioconductor_full:devel

# ensembl-vep

ENV VEP_VERSION 95
ADD https://github.com/Ensembl/ensembl-vep/archive/release/${VEP_VERSION}.zip \
        /tmp
RUN unzip /tmp/${VEP_VERSION}.zip -d /tmp && \
        rm /tmp/${VEP_VERSION}.zip && \
        mv /tmp/ensembl-vep-release-${VEP_VERSION} /usr/local && \
        cd /usr/local/ensembl-vep-release-${VEP_VERSION} && \
        perl INSTALL.pl --NO_HTSLIB -a a
ENV MY_VEP /usr/local/ensembl-vep-release-${VEP_VERSION}

# bioc user

RUN useradd --create-home --shell /bin/bash --home-dir /home/bioc bioc

RUN mkdir -p /home/bioc/R/library && \
        echo "R_LIBS=~/R/library" | cat > /home/bioc/.Renviron && \
        echo "PATH=${PATH}:${MY_VEP}" | cat >> /home/bioc/.Renviron

I built this with docker build -t ensembl-vep-bioc-full . and use this with

docker run -it \
    -v <host system path to R libraries>/docker_bioc_full:/home/bioc/R/library \
    --user bioc ensembl-vep-bioc-full R

Inside the container, I install ensemblVEP interactively

BiocManager::install("ensemblVEP")

This uses a really great idea from Levi Waldron; the docker image can build 'all of' Bioconductor, and the library of docker-installed packages is managed on the host system for persistence.

ADD REPLY
0
Entering edit mode

Thanks a lot. This worked for me. I added which and set the $VEP_PATH variable. I don't know if both were necessary, but anyways I can now run ensemblVEP correctly. I also stripped out un-needed OS utilities.

Here is my working Dockerfile:

FROM centos:centos7.2.1511

RUN yum install -y bzip2 which # "bzip" needed to install conda, "which" needed by bioconductor ensemblVEP 

# Install miniconda to /miniconda
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh
RUN bash Miniconda3-4.5.12-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda3-4.5.12-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}

RUN conda update -n base -c defaults conda
RUN conda config --add channels bioconda
RUN conda config --add channels conda-forge

RUN conda install ensembl-vep=94.5 bioconductor-ensemblvep=1.24.0 openssl=1.0 #Need to pin openssl to this version for some R packages. Probably won't need to specify in the future.
ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6