ensemblVEP supported versions, and troubleshooting
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…

Hi there,

 

This page - http://www.bioconductor.org/packages/release/bioc/html/ensemblVEP.html - says that the EnsemblVEP bioconductor package needs v88 of the Ensembl API, but from some of the help pages within ensemblVEP v1.20.0 it looks like maybe v90 is now supported too?  

I'm asking because I'm trying to troubleshoot a problem that probably has to do with my installation of the back-end ensembl perl modules and my PERL5LIB variable, rather than bioconductor itself (not sure). 

I'm a bit confused, as when I issue what I think is an equivalent unix command-line call to the vep script, it seems to work fine.  Does bioconductor make a system call to the vep perl script in order to do the analysis?  If so, is there an option within R to show me the command it's trying to run, so I can replicate that and and test it from the command line?  Where does bioc take PERL5LIB from (on a linux box) - is it simply taken from my environment when R starts up?

thanks very much,

Janet

 

 

ensemblVEP • 1.5k views
ADD COMMENT
0
Entering edit mode
shepherl 4.1k
@lshep
Last seen 14 hours ago
United States

I'm sorry for the oversight in the SystemRequirements documentation - Yes ensembleVEP 1.20.0 supports v90. The old VEPFlags() argument has been replaced with VEPParam()

ensemblVEP does eventually do a system call to vep to do the analysis. There currently isn't an option to see the call that is being made but feel free to open an issue on https://github.com/Bioconductor/ensemblVEP to request this. The best way to see the call is to do a debug on the method and step through the function to see the call - For example:

# In R 
library(ensemblVEP)

# you can see the function code 
selectMethod(ensemblVEP, signature="character")


file <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
myparam <- VEPFlags(flags=list(vcf=TRUE))

#
# This will let you step through the function
# Every time you press Enter the next line is evaluated
# 
debugonce(ensemblVEP, signature="character")
vcf <- ensemblVEP(file, param=myparam)

#
# Before executing this line: 
# Browse[3]> 
# debug: system2("perl", call)
#  
Browse[3]> call
[1] "/home/lori/b/other/ensembl-vep/vep -i \"/home/lori/R/x86_64-pc-linux-gnu-library/3.4-BioC-3.6/VariantAnnotation/extdata/ex2.vcf\" --vcf TRUE  --host useastdb.ensembl.org  --database TRUE --output_file \"/tmp/RtmpgUi8kU/file6f5743c1d49d\""
ADD COMMENT
0
Entering edit mode

Thanks, Lori - that did let me troubleshoot successfully. I did just put in a github issue with a request to see the system call - that was really useful once you showed me how to do it using debugonce.

I should probably update to VEP v90 soon, but perhaps it's also useful to let you know where I stumbled: it seems simple to build in a test within VEPParam that would prevent this mysterious error.

It turns out that the default host used by the bioc module useastdb.ensembl.org) does not support anything other than the current or current minus 1 versions (according to https://www.ensembl.org/info/data/mysql.html), and gives a very uninformative error when you try to use an older version of the vep script, whether on command line or through R/Bioc.  The default host for the command-line vep script is different ensembldb.ensembl.org) - that's why I didn't replicate the error on the command line. Using ensembldb.ensembl.org as host instead fixes the issue.

I've given details below - does that work for you, or should I put this into a github issue?

So, while this fails with a mysterious error (that was what made me suspect my vep/perl installation):

library(ensemblVEP)
file <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
myparam88a <- VEPParam( version=88, dataformat=c(vcf=TRUE)) 
vcf88a <- ensemblVEP(file, param=myparam88a)

#Can't use an undefined value as an ARRAY reference at #/home/jayoung/malik_lab_shared/perl/ensembl/modules/Bio/EnsEMBL/Registry.pm line 2546.
#Error in .io_check_exists(path(con)) : file(s) do not exist:
#  '/tmp/Rtmpi8eJw2/file52c81bd6649'

when I specify a different host it works fine:

myparam88b <- VEPParam( version=88, dataformat=c(vcf=TRUE), database=c(host="ensembldb.ensembl.org")) 
vcf88b <- ensemblVEP(file, param=myparam88b)

 

Would it be possible to change the default host when older versions of VEP are specified?  I didn't expect that to be an issue, because on my real data I was using a local cached database and didn't think I was contacting any host at all.  The same fix works on my data, though, so I'm happy:

tempVEPcacheDir <- "/fh/fast/malik_h/grp/public_databases/Ensembl/VariantEffectPredictorCache"
tempVEPpluginsDir <- "/fh/fast/malik_h/grp/public_databases/Ensembl/VariantEffectPredictorCache/plugins"

## this fails: 
myparamsYeast88a <- VEPParam( version=88,
                            input=c(species="saccharomyces_cerevisiae"), 
                            cache=c(cache=TRUE, dir=tempVEPcacheDir,
                                dir_cache=tempVEPcacheDir, dir_plugins=tempVEPpluginsDir),
                            database=c(database=FALSE),
                            dataformat=c(vcf=TRUE) ) 
tempTest88a <- ensemblVEP(testFile, myparamsYeast88a) 
#Can't use an undefined value as an ARRAY reference at #/home/jayoung/malik_lab_shared/perl/ensembl/modules/Bio/EnsEMBL/Registry.pm line 2546.
#Error in .io_check_exists(path(con)) : file(s) do not exist:
#  '/tmp/RtmpxuFFsr/file560c5748392c'

## this works:
myparamsYeast88b <- VEPParam( version=88,
                            input=c(species="saccharomyces_cerevisiae"), 
                            cache=c(cache=TRUE, dir=tempVEPcacheDir,
                                dir_cache=tempVEPcacheDir, dir_plugins=tempVEPpluginsDir),
                            database=c(database=FALSE, host="ensembldb.ensembl.org"),
                            dataformat=c(vcf=TRUE) ) 
tempTest88b <- ensemblVEP(testFile, myparamsYeast88b) 

 

Session info is below (truncated - does github really only allow 5000 chars in a reply?). Thanks for the quick response - it's nice to have this figured out already.

Janet

---------

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

other attached packages:
[1] ensemblVEP_1.20.1          VariantAnnotation_1.22.3
[3] Rsamtools_1.28.0           Biostrings_2.44.2         
[5] XVector_0.16.0             SummarizedExperiment_1.6.0
[7] DelayedArray_0.2.0         matrixStats_0.52.2        
[9] Biobase_2.36.0             GenomicRanges_1.28.4      
[11] GenomeInfoDb_1.12.0        IRanges_2.10.3            
[13] S4Vectors_0.14.0           BiocGenerics_0.22.0      

ADD REPLY
0
Entering edit mode

Thank you!  also very informative. If you could also open an issue for this.  We do use the github repositories now for tracking issues and that way it stays there until it is fixed and doesn't get lost in the support site issue.  Just please copy the code chunks you provided and if you could include a link to this support site issue that would be wonderful.  

Glad you were able to get it straightened out. 

Cheers, 

Lori 

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6