ensemblVEP, variant_effect_predictor versions and release schedule
1
0
Entering edit mode
@thomas-sandmann-5891
Last seen 10.3 years ago
Hi Valerie, thanks a lot for supporting legacy versions of the ensembl database / variant_effect_predictor.pl script. I assume you're still using version 67 and have the data cached. Yes, that's right. We use ensembl release 67 together with the corresponding variant_effect_predictor.pl script version 2.5. > How are you calling the script right now? As a temporary fix, I am using the ensemblVEP method from ensemblVEP version 1.1.3 (BioC svn revision r76970). I think this is the last version that worked with ensembl release 67 for me. I modified the default parameters in the VEPParam object by creating a temporary "gVEPParam" class for use with our in-house ensembl release 67. This object is passed to the ensemblVEP method with the default parameters listed below. (Please note that our installation of variant_effect_predictor.pl by default connects to our in-house database.) Formal class 'gVEPParam' with 6 slots ..@ basic :List of 5 .. ..$ verbose : logi FALSE .. ..$ quiet : logi FALSE .. ..$ no_progress: logi TRUE .. ..$ config : chr(0) .. ..$ everything : logi FALSE ..@ input :List of 4 .. ..$ species : chr "homo_sapiens" .. ..$ format : chr(0) .. ..$ output_file : chr(0) .. ..$ force_overwrite: logi FALSE ..@ output :List of 24 .. ..$ terms : chr "so" .. ..$ sift : chr "b" .. ..$ polyphen : chr "b" .. ..$ regulatory : logi FALSE .. ..$ cell_type : chr(0) .. ..$ hgvs : logi TRUE .. ..$ hgnc : logi TRUE .. ..$ gene : logi TRUE .. ..$ protein : logi TRUE .. ..$ ccds : logi TRUE .. ..$ canonical : logi TRUE .. ..$ xref_refseq: logi FALSE .. ..$ numbers : logi TRUE .. ..$ domains : logi TRUE .. ..$ most_severe: logi FALSE .. ..$ summary : logi FALSE .. ..$ per_gene : logi FALSE .. ..$ convert : chr(0) .. ..$ fields : chr(0) .. ..$ vcf : logi FALSE .. ..$ gvf : logi FALSE .. ..$ original : logi FALSE .. ..$ custom : chr(0) .. ..$ plugin : chr "GNECondel,/Plugins/config/Condel/config" ..@ filterqc:List of 17 .. ..$ check_ref : logi FALSE .. ..$ coding_only : logi FALSE .. ..$ check_existing : logi TRUE .. ..$ check_alleles : logi FALSE .. ..$ check_svs : logi FALSE .. ..$ individual : chr(0) .. ..$ chr : chr(0) .. ..$ no_intergenic : logi FALSE .. ..$ filter_common : logi FALSE .. ..$ check_frequency : logi FALSE .. ..$ freq_pop : chr(0) .. ..$ freq_freq : logi FALSE .. ..$ freq_gt_lt : chr(0) .. ..$ freq_filter : chr(0) .. ..$ filter : chr(0) .. ..$ failed : logi FALSE .. ..$ allow_non_variant: logi FALSE ..@ database:List of 9 .. ..$ database : logi FALSE .. ..$ host : chr "useastdb.ensembl.org" .. ..$ user : chr(0) .. ..$ password : chr(0) .. ..$ port : num(0) .. ..$ genomes : logi FALSE .. ..$ refseq : logi FALSE .. ..$ db_version: num(0) .. ..$ registry : chr(0) ..@ advanced:List of 4 .. ..$ no_whole_genome: logi FALSE .. ..$ buffer_size : num 5000 .. ..$ compress : chr(0) .. ..$ skip_db_check : logi FALSE > Do you use the --cache flag or --offline flag? I am not using the --cache flag right now, because version 2.5 of the variant_effect_predictor.pl script does not allow me to specify the Plugin directory and the cache directory separately. (This was only introduced in a later version of the perl script). The --offline flag does not seem to be available in variant_effect_predictor.pl version 2.5, at least I cannot find it in the listed arguments (provided below for reference). version 2.5 Options ======= --help Display this message and quit --verbose Display verbose output as the script runs [default: off] --quiet Suppress status and warning messages [default: off] --no_progress Suppress progress bars [default: off] --config Load configuration from file. Any command line options specified overwrite those in the file [default: off] --everything Shortcut switch to turn on commonly used options. See web documentation for details [default: off] -i | --input_file Input file - if not specified, reads from STDIN. Files may be gzip compressed. --format Specify input file format - one of "ensembl", "pileup", "vcf", "hgvs", "id" or "guess" to try and work out format. -o | --output_file Output file. Write to STDOUT by specifying -o STDOUT - this will force --quiet [default: "variant_effect_output.txt"] --force_overwrite Force overwriting of output file [default: quit if file exists] --original Writes output as it was in input - must be used with --filter since no consequence data is added [default: off] --vcf Write output as VCF [default: off] --gvf Write output as GVF [default: off] --fields [field list] Define a custom output format by specifying a comma-separated list of field names. Field names normally present in the "Extra" field may also be specified, including those added by plugin modules. Can also be used to configure VCF output columns [default: off] --species [species] Species to use [default: "human"] -t | --terms Type of consequence terms to output - one of "ensembl", "SO", "NCBI" [default: ensembl] --sift=[p|s|b] Add SIFT [p]rediction, [s]core or [b]oth [default: off] --polyphen=[p|s|b] Add PolyPhen [p]rediction, [s]core or [b]oth [default: off] --regulatory Look for overlaps with regulatory regions. The script can also call if a variant falls in a high information position within a transcription factor binding site. Output lines have a Feature type of RegulatoryFeature or MotifFeature [default: off] --cell_type [types] Report only regulatory regions that are found in the given cell type(s). Can be a single cell type or a comma-separated list. The functional type in each cell type is reported under CELL_TYPE in the output. To retrieve a list of cell types, use "--cell_type list" [default: off] --custom [file list] Add custom annotations from tabix-indexed files. See documentation for full details [default: off] --plugin [plugin_name] Use named plugin module [default: off] --hgnc Add HGNC gene identifiers to output [default: off] --hgvs Output HGVS identifiers (coding and protein). Requires database connection [default: off] --ccds Output CCDS transcript identifiers [default: off] --xref_refseq Output aligned RefSeq mRNA identifier for transcript. NB: the RefSeq and Ensembl transcripts aligned in this way MAY NOT, AND FREQUENTLY WILL NOT, match exactly in sequence, exon structure and protein product [default: off] --protein Output Ensembl protein identifer [default: off] --gene Force output of Ensembl gene identifer - disabled by default unless using --cache or --no_whole_genome [default: off] --canonical Indicate if the transcript for this consequence is the canonical transcript for this gene [default: off] --domains Include details of any overlapping protein domains [default: off] --numbers Include exon & intron numbers [default: off] --no_intergenic Excludes intergenic consequences from the output [default: off] --coding_only Only return consequences that fall in the coding region of transcripts [default: off] --most_severe Ouptut only the most severe consequence per variation. Transcript-specific columns will be left blank. [default: off] --summary Output only a comma-separated list of all consequences per variation. Transcript-specific columns will be left blank. [default: off] --per_gene Output only the most severe consequence per gene. Where more than one transcript has the same consequence, the transcript chosen is arbitrary. [default: off] --check_ref If specified, checks supplied reference allele against stored entry in Ensembl Core database [default: off] --check_existing If specified, checks for existing co-located variations in the Ensembl Variation database [default: off] --failed [0|1] Include (1) or exclude (0) variants that have been flagged as failed by Ensembl when checking for existing variants. [default: exclude] --check_alleles If specified, the alleles of existing co- located variations are compared to the input; an existing variation will only be reported if no novel allele is in the input (strand is accounted for) [default: off] --check_svs Report overlapping structural variants [default: off] --filter [filters] Filter output by consequence type. Use this to output only variants that have at least one consequence type matching the filter. Multiple filters can be used separated by ",". By combining this with --original it is possible to run the VEP iteratively to progressively filter a set of variants. See documentation for full details [default: off] --check_frequency Turns on frequency filtering. Use this to include or exclude variants based on the frequency of co-located existing variants in the Ensembl Variation database. You must also specify all of the following --freq flags [default: off] --freq_pop [pop] Name of the population to use e.g. hapmap_ceu for CEU HapMap, 1kg_yri for YRI 1000 genomes. See documentation for more details --freq_freq [freq] Frequency to use in filter. Must be a number between 0 and 0.5 --freq_gt_lt [gt|lt] Specify whether the frequency should be greater than (gt) or less than (lt) --freq_freq --freq_filter Specify whether variants that pass the above should be included [exclude|include] or excluded from analysis --individual [id] Consider only alternate alleles present in the genotypes of the specified individual(s). May be a single individual, a comma- separated list or "all" to assess all individuals separately. Each individual and variant combination is given on a separate line of output. Only works with VCF files containing individual genotype data; individual IDs are taken from column headers. --allow_non_variant Prints out non-variant lines when using VCF input --chr [list] Select a subset of chromosomes to analyse from your file. Any data not on this chromosome in the input will be skipped. The list can be comma separated, with "-" characters representing a range e.g. 1-5,8,15,X [default: off] --gp If specified, tries to read GRCh37 position from GP field in the INFO column of a VCF file. Only applies when VCF is the input format and human is the species [default: off] --convert Convert the input file to the output format specified. [ensembl|vcf|pileup] Converted output is written to the file specified in --output_file. No consequence calculation is carried out when doing file conversion. [default: off] --refseq Use the otherfeatures database to retrieve transcripts - this database contains RefSeq transcripts (as well as CCDS and Ensembl EST alignments) [default: off] --host Manually define database host [default: " ensembldb.ensembl.org"] -u | --user Database username [default: "anonymous"] --port Database port [default: 5306] --password Database password [default: no password] --genomes Sets DB connection params for Ensembl Genomes [default: off] --registry Registry file to use defines DB connections [default: off] Defining a registry file overrides above connection settings. --db_version=[number] Force script to load DBs from a specific Ensembl version. Not advised due to likely incompatibilities between API and DB --no_whole_genome Run in old-style, non-whole genome mode [default: off] --buffer_size Sets the number of variants sent in each batch [default: 5000] Increasing buffer size can retrieve results more quickly but requires more memory. Only applies to whole genome mode. --cache Enables read-only use of cache [default: off] --dir [directory] Specify the base cache directory to use [default: "$HOME/.vep/"] --write_cache Enable writing to cache [default: off] --build [all|list] Build a complete cache for the selected species. Build for all chromosomes with --build all, or a list of chromosomes (see --chr). DO NOT USE WHEN CONNECTED TO PUBLIC DB SERVERS AS THIS VIOLATES OUR FAIR USAGE POLICY [default: off] --compress Specify utility to decompress cache files - may be "gzcat" or "gzip -dc" Only use if default does not work [default: zcat] --skip_db_check ADVANCED! Force the script to use a cache built from a different database than specified with --host. Only use this if you are sure the hosts are compatible (e.g. ensembldb.ensembl.org and useastdb.ensembl.org) [default: off] --cache_region_size ADVANCED! The size in base-pairs of the region covered by one file in the cache. [default: 1MB] > Also, please remind me of (point me to) the plug-in you're using so I can > test that. > We are using a single plugin that returns the Condel scores. The *Condel plugin* can be found on github here: https://github.com/ensembl-variation/VEP_plugins Again, thanks a lot for your support. Please let me know if there is anything I can do to help, e.g. with testing the package. Best, Thomas [[alternative HTML version deleted]]
Transcription HapMap PolyPhen SIFT convert genomes ensemblVEP Transcription HapMap SIFT • 2.4k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
OK, this is ready to go. Changes are checked into v 1.3.6. The default creates a VEPParam compatible with the current API. param73 <- VEPParam() >> param73 > class: VEPParam73 > identifier(0): > colocatedVariants(0): > dataformat(0): > basic(0): > input(1): species > cache(3): dir, dir_cache, dir_plugins > output(1): terms > filterqc(0): > database(2): host, database > advanced(1): buffer_size > version(2): 73, 74 > scriptPath(0): To create a VEPParam for an archived version supply the version to the constructor. param67 <- VEPParam(67) >>> param67 >> class: VEPParam67 >> basic(0): >> input(1): species >> cache(1): dir >> output(1): terms >> filterqc(0): >> database(1): host >> advanced(1): buffer_size >> version(1): 67 >> scriptPath(0): supportedVEP() lists all classes and supported versions. The idea is to only create a new subclass when a substantial change is made to the API. You can see that VEPParam73 supports both 73 and 74. I'll keep adding versions to this class until the interface requires a major change. supportedVEP() >> supportedVEP() > $VEPParam67 > [1] 67 > > $VEPParam73 > [1] 73 74 To specify a non-standard location of your .pl script use the scriptPath<- setter. This was added to handle the case where multiple versions are installed locally. scriptPath(param67) <- "fullPathToScript/variant_effect_predictor.pl" These examples and more are on ?VEPParam. Let me know how it goes. Valerie On 01/07/2014 02:01 AM, Thomas Sandmann wrote: > Hi Valerie, > > thanks a lot for supporting legacy versions of the ensembl database / > variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> script. > > I assume you're still using version 67 and have the data cached. > > > Yes, that's right. We use ensembl release 67 together with the > corresponding variant_effect_predictor.pl > <http: variant_effect_predictor.pl=""> script version 2.5. > > How are you calling the script right now? > > > As a temporary fix, I am using the ensemblVEP method from ensemblVEP > version 1.1.3 (BioC svn revision r76970). I think this is the last > version that worked with ensembl release 67 for me. > > I modified the default parameters in the VEPParam object by creating a > temporary "gVEPParam" class for use with our in-house ensembl release > 67. This object is passed to the ensemblVEP method with the default > parameters listed below. (Please note that our installation of > variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> by > default connects to our in-house database.) > > Formal class 'gVEPParam' with 6 slots > ..@ basic :List of 5 > .. ..$ verbose : logi FALSE > .. ..$ quiet : logi FALSE > .. ..$ no_progress: logi TRUE > .. ..$ config : chr(0) > .. ..$ everything : logi FALSE > ..@ input :List of 4 > .. ..$ species : chr "homo_sapiens" > .. ..$ format : chr(0) > .. ..$ output_file : chr(0) > .. ..$ force_overwrite: logi FALSE > ..@ output :List of 24 > .. ..$ terms : chr "so" > .. ..$ sift : chr "b" > .. ..$ polyphen : chr "b" > .. ..$ regulatory : logi FALSE > .. ..$ cell_type : chr(0) > .. ..$ hgvs : logi TRUE > .. ..$ hgnc : logi TRUE > .. ..$ gene : logi TRUE > .. ..$ protein : logi TRUE > .. ..$ ccds : logi TRUE > .. ..$ canonical : logi TRUE > .. ..$ xref_refseq: logi FALSE > .. ..$ numbers : logi TRUE > .. ..$ domains : logi TRUE > .. ..$ most_severe: logi FALSE > .. ..$ summary : logi FALSE > .. ..$ per_gene : logi FALSE > .. ..$ convert : chr(0) > .. ..$ fields : chr(0) > .. ..$ vcf : logi FALSE > .. ..$ gvf : logi FALSE > .. ..$ original : logi FALSE > .. ..$ custom : chr(0) > .. ..$ plugin : chr "GNECondel,/Plugins/config/Condel/config" > ..@ filterqc:List of 17 > .. ..$ check_ref : logi FALSE > .. ..$ coding_only : logi FALSE > .. ..$ check_existing : logi TRUE > .. ..$ check_alleles : logi FALSE > .. ..$ check_svs : logi FALSE > .. ..$ individual : chr(0) > .. ..$ chr : chr(0) > .. ..$ no_intergenic : logi FALSE > .. ..$ filter_common : logi FALSE > .. ..$ check_frequency : logi FALSE > .. ..$ freq_pop : chr(0) > .. ..$ freq_freq : logi FALSE > .. ..$ freq_gt_lt : chr(0) > .. ..$ freq_filter : chr(0) > .. ..$ filter : chr(0) > .. ..$ failed : logi FALSE > .. ..$ allow_non_variant: logi FALSE > ..@ database:List of 9 > .. ..$ database : logi FALSE > .. ..$ host : chr "useastdb.ensembl.org > <http: useastdb.ensembl.org="">" > .. ..$ user : chr(0) > .. ..$ password : chr(0) > .. ..$ port : num(0) > .. ..$ genomes : logi FALSE > .. ..$ refseq : logi FALSE > .. ..$ db_version: num(0) > .. ..$ registry : chr(0) > ..@ advanced:List of 4 > .. ..$ no_whole_genome: logi FALSE > .. ..$ buffer_size : num 5000 > .. ..$ compress : chr(0) > .. ..$ skip_db_check : logi FALSE > > Do you use the --cache flag or --offline flag? > > > I am not using the --cache flag right now, because version 2.5 of the > variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> script > does not allow me to specify the Plugin directory and the > cache directory separately. (This was only introduced in a later version > of the perl script). > > The --offline flag does not seem to be available in > variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> > version 2.5, at least I cannot find it in the listed arguments > (provided below for reference). > > version 2.5 > > Options > ======= > > --help Display this message and quit > --verbose Display verbose output as the script runs > [default: off] > --quiet Suppress status and warning messages [default: off] > --no_progress Suppress progress bars [default: off] > > --config Load configuration from file. Any command line > options > specified overwrite those in the file [default: off] > --everything Shortcut switch to turn on commonly used options. > See web > documentation for details [default: off] > > -i | --input_file Input file - if not specified, reads from STDIN. > Files > may be gzip compressed. > --format Specify input file format - one of "ensembl", > "pileup", > "vcf", "hgvs", "id" or "guess" to try and work > out format. > -o | --output_file Output file. Write to STDOUT by specifying -o > STDOUT - this > will force --quiet [default: > "variant_effect_output.txt"] > --force_overwrite Force overwriting of output file [default: quit > if file > exists] > --original Writes output as it was in input - must be used > with --filter > since no consequence data is added [default: off] > --vcf Write output as VCF [default: off] > --gvf Write output as GVF [default: off] > --fields [field list] Define a custom output format by specifying a > comma-separated > list of field names. Field names normally > present in the > "Extra" field may also be specified, including > those added by > plugin modules. Can also be used to configure > VCF output > columns [default: off] > --species [species] Species to use [default: "human"] > > -t | --terms Type of consequence terms to output - one of > "ensembl", "SO", > "NCBI" [default: ensembl] > --sift=[p|s|b] Add SIFT [p]rediction, [s]core or [b]oth > [default: off] > --polyphen=[p|s|b] Add PolyPhen [p]rediction, [s]core or [b]oth > [default: off] > --regulatory Look for overlaps with regulatory regions. The > script can > also call if a variant falls in a high > information position > within a transcription factor binding site. > Output lines have > a Feature type of RegulatoryFeature or MotifFeature > [default: off] > --cell_type [types] Report only regulatory regions that are found in > the given cell > type(s). Can be a single cell type or a > comma-separated list. > The functional type in each cell type is > reported under > CELL_TYPE in the output. To retrieve a list of > cell types, use > "--cell_type list" [default: off] > --custom [file list] Add custom annotations from tabix-indexed files. See > documentation for full details [default: off] > --plugin [plugin_name] Use named plugin module [default: off] > --hgnc Add HGNC gene identifiers to output [default: off] > --hgvs Output HGVS identifiers (coding and protein). > Requires database > connection [default: off] > --ccds Output CCDS transcript identifiers [default: off] > --xref_refseq Output aligned RefSeq mRNA identifier for > transcript. NB: the > RefSeq and Ensembl transcripts aligned in this > way MAY NOT, AND > FREQUENTLY WILL NOT, match exactly in sequence, > exon structure > and protein product [default: off] > --protein Output Ensembl protein identifer [default: off] > --gene Force output of Ensembl gene identifer - disabled > by default > unless using --cache or --no_whole_genome > [default: off] > --canonical Indicate if the transcript for this consequence > is the canonical > transcript for this gene [default: off] > --domains Include details of any overlapping protein > domains [default: off] > --numbers Include exon & intron numbers [default: off] > > --no_intergenic Excludes intergenic consequences from the output > [default: off] > --coding_only Only return consequences that fall in the coding > region of > transcripts [default: off] > --most_severe Ouptut only the most severe consequence per > variation. > Transcript-specific columns will be left blank. > [default: off] > --summary Output only a comma-separated list of all > consequences per > variation. Transcript-specific columns will be > left blank. > [default: off] > --per_gene Output only the most severe consequence per gene. > Where more > than one transcript has the same consequence, > the transcript > chosen is arbitrary. [default: off] > --check_ref If specified, checks supplied reference allele > against stored > entry in Ensembl Core database [default: off] > --check_existing If specified, checks for existing co-located > variations in the > Ensembl Variation database [default: off] > --failed [0|1] Include (1) or exclude (0) variants that have > been flagged as > failed by Ensembl when checking for existing > variants. > [default: exclude] > --check_alleles If specified, the alleles of existing co- located > variations > are compared to the input; an existing variation > will only > be reported if no novel allele is in the input > (strand is > accounted for) [default: off] > --check_svs Report overlapping structural variants [default: off] > > --filter [filters] Filter output by consequence type. Use this to > output only > variants that have at least one consequence type > matching the > filter. Multiple filters can be used separated > by ",". By > combining this with --original it is possible to > run the VEP > iteratively to progressively filter a set of > variants. See > documentation for full details [default: off] > > --check_frequency Turns on frequency filtering. Use this to include > or exclude > variants based on the frequency of co- located > existing > variants in the Ensembl Variation database. You > must also > specify all of the following --freq flags > [default: off] > --freq_pop [pop] Name of the population to use e.g. hapmap_ceu for > CEU HapMap, > 1kg_yri for YRI 1000 genomes. See documentation > for more > details > --freq_freq [freq] Frequency to use in filter. Must be a number > between 0 and 0.5 > --freq_gt_lt [gt|lt] Specify whether the frequency should be greater > than (gt) or > less than (lt) --freq_freq > --freq_filter Specify whether variants that pass the above > should be included > [exclude|include] or excluded from analysis > --individual [id] Consider only alternate alleles present in the > genotypes of the > specified individual(s). May be a single > individual, a comma- > separated list or "all" to assess all > individuals separately. > Each individual and variant combination is given > on a separate > line of output. Only works with VCF files > containing individual > genotype data; individual IDs are taken from > column headers. > --allow_non_variant Prints out non-variant lines when using VCF input > --chr [list] Select a subset of chromosomes to analyse from > your file. Any > data not on this chromosome in the input will be > skipped. The > list can be comma separated, with "-" characters > representing > a range e.g. 1-5,8,15,X [default: off] > --gp If specified, tries to read GRCh37 position from > GP field in the > INFO column of a VCF file. Only applies when VCF > is the input > format and human is the species [default: off] > --convert Convert the input file to the output format > specified. > [ensembl|vcf|pileup] Converted output is written to the file specified in > --output_file. No consequence calculation is > carried out when > doing file conversion. [default: off] > > --refseq Use the otherfeatures database to retrieve > transcripts - this > database contains RefSeq transcripts (as well as > CCDS and > Ensembl EST alignments) [default: off] > --host Manually define database host [default: > "ensembldb.ensembl.org <http: ensembldb.ensembl.org="">"] > -u | --user Database username [default: "anonymous"] > --port Database port [default: 5306] > --password Database password [default: no password] > --genomes Sets DB connection params for Ensembl Genomes > [default: off] > --registry Registry file to use defines DB connections > [default: off] > Defining a registry file overrides above > connection settings. > --db_version=[number] Force script to load DBs from a specific Ensembl > version. Not > advised due to likely incompatibilities between > API and DB > > --no_whole_genome Run in old-style, non-whole genome mode [default: > off] > --buffer_size Sets the number of variants sent in each batch > [default: 5000] > Increasing buffer size can retrieve results more > quickly > but requires more memory. Only applies to whole > genome mode. > --cache Enables read-only use of cache [default: off] > --dir [directory] Specify the base cache directory to use [default: > "$HOME/.vep/"] > --write_cache Enable writing to cache [default: off] > --build [all|list] Build a complete cache for the selected species. > Build for all > chromosomes with --build all, or a list of > chromosomes (see > --chr). DO NOT USE WHEN CONNECTED TO PUBLIC DB > SERVERS AS THIS > VIOLATES OUR FAIR USAGE POLICY [default: off] > --compress Specify utility to decompress cache files - may > be "gzcat" or > "gzip -dc" Only use if default does not work > [default: zcat] > --skip_db_check ADVANCED! Force the script to use a cache built > from a different > database than specified with --host. Only use > this if you are > sure the hosts are compatible (e.g. > ensembldb.ensembl.org <http: ensembldb.ensembl.org=""> and > useastdb.ensembl.org <http: useastdb.ensembl.org="">) [default: off] > --cache_region_size ADVANCED! The size in base-pairs of the region > covered by one > file in the cache. [default: 1MB] > > Also, please remind me of (point me to) the plug-in you're using so > I can test that. > > > We are using a single plugin that returns the Condel scores. The *Condel > plugin* can be found on github here: > https://github.com/ensembl-variation/VEP_plugins > > Again, thanks a lot for your support. Please let me know if there is > anything I can do to help, e.g. with testing the package. > > Best, > Thomas > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
One interesting caveat I should mention. The east coast mirror only supports the most current version of the API. When using archived versions you need to use the cache (much preferred) or a live query against the European mirror. I've set the 'host' default to 'ensembldb.ensembl.org' for archived versions. I know you have local data so this isn't an issue for you - just wanted to mention it for the wider audience. Valerie On 01/13/2014 10:41 AM, Valerie Obenchain wrote: > OK, this is ready to go. Changes are checked into v 1.3.6. > > The default creates a VEPParam compatible with the current API. > > param73 <- VEPParam() >>> param73 >> class: VEPParam73 >> identifier(0): >> colocatedVariants(0): >> dataformat(0): >> basic(0): >> input(1): species >> cache(3): dir, dir_cache, dir_plugins >> output(1): terms >> filterqc(0): >> database(2): host, database >> advanced(1): buffer_size >> version(2): 73, 74 >> scriptPath(0): > > To create a VEPParam for an archived version supply the version to the > constructor. > > param67 <- VEPParam(67) >>>> param67 >>> class: VEPParam67 >>> basic(0): >>> input(1): species >>> cache(1): dir >>> output(1): terms >>> filterqc(0): >>> database(1): host >>> advanced(1): buffer_size >>> version(1): 67 >>> scriptPath(0): > > supportedVEP() lists all classes and supported versions. The idea is to > only create a new subclass when a substantial change is made to the API. > You can see that VEPParam73 supports both 73 and 74. I'll keep adding > versions to this class until the interface requires a major change. > > supportedVEP() >>> supportedVEP() >> $VEPParam67 >> [1] 67 >> >> $VEPParam73 >> [1] 73 74 > > > To specify a non-standard location of your .pl script use the > scriptPath<- setter. This was added to handle the case where multiple > versions are installed locally. > > scriptPath(param67) <- "fullPathToScript/variant_effect_predictor.pl" > > These examples and more are on ?VEPParam. Let me know how it goes. > > Valerie > > > > On 01/07/2014 02:01 AM, Thomas Sandmann wrote: >> Hi Valerie, >> >> thanks a lot for supporting legacy versions of the ensembl database / >> variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> script. >> >> I assume you're still using version 67 and have the data cached. >> >> >> Yes, that's right. We use ensembl release 67 together with the >> corresponding variant_effect_predictor.pl >> <http: variant_effect_predictor.pl=""> script version 2.5. >> >> How are you calling the script right now? >> >> >> As a temporary fix, I am using the ensemblVEP method from ensemblVEP >> version 1.1.3 (BioC svn revision r76970). I think this is the last >> version that worked with ensembl release 67 for me. >> >> I modified the default parameters in the VEPParam object by creating a >> temporary "gVEPParam" class for use with our in-house ensembl release >> 67. This object is passed to the ensemblVEP method with the default >> parameters listed below. (Please note that our installation of >> variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> by >> default connects to our in-house database.) >> >> Formal class 'gVEPParam' with 6 slots >> ..@ basic :List of 5 >> .. ..$ verbose : logi FALSE >> .. ..$ quiet : logi FALSE >> .. ..$ no_progress: logi TRUE >> .. ..$ config : chr(0) >> .. ..$ everything : logi FALSE >> ..@ input :List of 4 >> .. ..$ species : chr "homo_sapiens" >> .. ..$ format : chr(0) >> .. ..$ output_file : chr(0) >> .. ..$ force_overwrite: logi FALSE >> ..@ output :List of 24 >> .. ..$ terms : chr "so" >> .. ..$ sift : chr "b" >> .. ..$ polyphen : chr "b" >> .. ..$ regulatory : logi FALSE >> .. ..$ cell_type : chr(0) >> .. ..$ hgvs : logi TRUE >> .. ..$ hgnc : logi TRUE >> .. ..$ gene : logi TRUE >> .. ..$ protein : logi TRUE >> .. ..$ ccds : logi TRUE >> .. ..$ canonical : logi TRUE >> .. ..$ xref_refseq: logi FALSE >> .. ..$ numbers : logi TRUE >> .. ..$ domains : logi TRUE >> .. ..$ most_severe: logi FALSE >> .. ..$ summary : logi FALSE >> .. ..$ per_gene : logi FALSE >> .. ..$ convert : chr(0) >> .. ..$ fields : chr(0) >> .. ..$ vcf : logi FALSE >> .. ..$ gvf : logi FALSE >> .. ..$ original : logi FALSE >> .. ..$ custom : chr(0) >> .. ..$ plugin : chr "GNECondel,/Plugins/config/Condel/config" >> ..@ filterqc:List of 17 >> .. ..$ check_ref : logi FALSE >> .. ..$ coding_only : logi FALSE >> .. ..$ check_existing : logi TRUE >> .. ..$ check_alleles : logi FALSE >> .. ..$ check_svs : logi FALSE >> .. ..$ individual : chr(0) >> .. ..$ chr : chr(0) >> .. ..$ no_intergenic : logi FALSE >> .. ..$ filter_common : logi FALSE >> .. ..$ check_frequency : logi FALSE >> .. ..$ freq_pop : chr(0) >> .. ..$ freq_freq : logi FALSE >> .. ..$ freq_gt_lt : chr(0) >> .. ..$ freq_filter : chr(0) >> .. ..$ filter : chr(0) >> .. ..$ failed : logi FALSE >> .. ..$ allow_non_variant: logi FALSE >> ..@ database:List of 9 >> .. ..$ database : logi FALSE >> .. ..$ host : chr "useastdb.ensembl.org >> <http: useastdb.ensembl.org="">" >> .. ..$ user : chr(0) >> .. ..$ password : chr(0) >> .. ..$ port : num(0) >> .. ..$ genomes : logi FALSE >> .. ..$ refseq : logi FALSE >> .. ..$ db_version: num(0) >> .. ..$ registry : chr(0) >> ..@ advanced:List of 4 >> .. ..$ no_whole_genome: logi FALSE >> .. ..$ buffer_size : num 5000 >> .. ..$ compress : chr(0) >> .. ..$ skip_db_check : logi FALSE >> >> Do you use the --cache flag or --offline flag? >> >> >> I am not using the --cache flag right now, because version 2.5 of the >> variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> script >> does not allow me to specify the Plugin directory and the >> cache directory separately. (This was only introduced in a later version >> of the perl script). >> >> The --offline flag does not seem to be available in >> variant_effect_predictor.pl <http: variant_effect_predictor.pl=""> >> version 2.5, at least I cannot find it in the listed arguments >> (provided below for reference). >> >> version 2.5 >> >> Options >> ======= >> >> --help Display this message and quit >> --verbose Display verbose output as the script runs >> [default: off] >> --quiet Suppress status and warning messages [default: >> off] >> --no_progress Suppress progress bars [default: off] >> >> --config Load configuration from file. Any command line >> options >> specified overwrite those in the file >> [default: off] >> --everything Shortcut switch to turn on commonly used options. >> See web >> documentation for details [default: off] >> >> -i | --input_file Input file - if not specified, reads from STDIN. >> Files >> may be gzip compressed. >> --format Specify input file format - one of "ensembl", >> "pileup", >> "vcf", "hgvs", "id" or "guess" to try and work >> out format. >> -o | --output_file Output file. Write to STDOUT by specifying -o >> STDOUT - this >> will force --quiet [default: >> "variant_effect_output.txt"] >> --force_overwrite Force overwriting of output file [default: quit >> if file >> exists] >> --original Writes output as it was in input - must be used >> with --filter >> since no consequence data is added [default: off] >> --vcf Write output as VCF [default: off] >> --gvf Write output as GVF [default: off] >> --fields [field list] Define a custom output format by specifying a >> comma-separated >> list of field names. Field names normally >> present in the >> "Extra" field may also be specified, including >> those added by >> plugin modules. Can also be used to configure >> VCF output >> columns [default: off] >> --species [species] Species to use [default: "human"] >> >> -t | --terms Type of consequence terms to output - one of >> "ensembl", "SO", >> "NCBI" [default: ensembl] >> --sift=[p|s|b] Add SIFT [p]rediction, [s]core or [b]oth >> [default: off] >> --polyphen=[p|s|b] Add PolyPhen [p]rediction, [s]core or [b]oth >> [default: off] >> --regulatory Look for overlaps with regulatory regions. The >> script can >> also call if a variant falls in a high >> information position >> within a transcription factor binding site. >> Output lines have >> a Feature type of RegulatoryFeature or >> MotifFeature >> [default: off] >> --cell_type [types] Report only regulatory regions that are found in >> the given cell >> type(s). Can be a single cell type or a >> comma-separated list. >> The functional type in each cell type is >> reported under >> CELL_TYPE in the output. To retrieve a list of >> cell types, use >> "--cell_type list" [default: off] >> --custom [file list] Add custom annotations from tabix-indexed >> files. See >> documentation for full details [default: off] >> --plugin [plugin_name] Use named plugin module [default: off] >> --hgnc Add HGNC gene identifiers to output [default: off] >> --hgvs Output HGVS identifiers (coding and protein). >> Requires database >> connection [default: off] >> --ccds Output CCDS transcript identifiers [default: off] >> --xref_refseq Output aligned RefSeq mRNA identifier for >> transcript. NB: the >> RefSeq and Ensembl transcripts aligned in this >> way MAY NOT, AND >> FREQUENTLY WILL NOT, match exactly in sequence, >> exon structure >> and protein product [default: off] >> --protein Output Ensembl protein identifer [default: off] >> --gene Force output of Ensembl gene identifer - disabled >> by default >> unless using --cache or --no_whole_genome >> [default: off] >> --canonical Indicate if the transcript for this consequence >> is the canonical >> transcript for this gene [default: off] >> --domains Include details of any overlapping protein >> domains [default: off] >> --numbers Include exon & intron numbers [default: off] >> >> --no_intergenic Excludes intergenic consequences from the output >> [default: off] >> --coding_only Only return consequences that fall in the coding >> region of >> transcripts [default: off] >> --most_severe Ouptut only the most severe consequence per >> variation. >> Transcript-specific columns will be left blank. >> [default: off] >> --summary Output only a comma-separated list of all >> consequences per >> variation. Transcript-specific columns will be >> left blank. >> [default: off] >> --per_gene Output only the most severe consequence per gene. >> Where more >> than one transcript has the same consequence, >> the transcript >> chosen is arbitrary. [default: off] >> --check_ref If specified, checks supplied reference allele >> against stored >> entry in Ensembl Core database [default: off] >> --check_existing If specified, checks for existing co-located >> variations in the >> Ensembl Variation database [default: off] >> --failed [0|1] Include (1) or exclude (0) variants that have >> been flagged as >> failed by Ensembl when checking for existing >> variants. >> [default: exclude] >> --check_alleles If specified, the alleles of existing co- located >> variations >> are compared to the input; an existing variation >> will only >> be reported if no novel allele is in the input >> (strand is >> accounted for) [default: off] >> --check_svs Report overlapping structural variants >> [default: off] >> >> --filter [filters] Filter output by consequence type. Use this to >> output only >> variants that have at least one consequence type >> matching the >> filter. Multiple filters can be used separated >> by ",". By >> combining this with --original it is possible to >> run the VEP >> iteratively to progressively filter a set of >> variants. See >> documentation for full details [default: off] >> >> --check_frequency Turns on frequency filtering. Use this to include >> or exclude >> variants based on the frequency of co- located >> existing >> variants in the Ensembl Variation database. You >> must also >> specify all of the following --freq flags >> [default: off] >> --freq_pop [pop] Name of the population to use e.g. hapmap_ceu for >> CEU HapMap, >> 1kg_yri for YRI 1000 genomes. See documentation >> for more >> details >> --freq_freq [freq] Frequency to use in filter. Must be a number >> between 0 and 0.5 >> --freq_gt_lt [gt|lt] Specify whether the frequency should be greater >> than (gt) or >> less than (lt) --freq_freq >> --freq_filter Specify whether variants that pass the above >> should be included >> [exclude|include] or excluded from analysis >> --individual [id] Consider only alternate alleles present in the >> genotypes of the >> specified individual(s). May be a single >> individual, a comma- >> separated list or "all" to assess all >> individuals separately. >> Each individual and variant combination is given >> on a separate >> line of output. Only works with VCF files >> containing individual >> genotype data; individual IDs are taken from >> column headers. >> --allow_non_variant Prints out non-variant lines when using VCF input >> --chr [list] Select a subset of chromosomes to analyse from >> your file. Any >> data not on this chromosome in the input will be >> skipped. The >> list can be comma separated, with "-" characters >> representing >> a range e.g. 1-5,8,15,X [default: off] >> --gp If specified, tries to read GRCh37 position from >> GP field in the >> INFO column of a VCF file. Only applies when VCF >> is the input >> format and human is the species [default: off] >> --convert Convert the input file to the output format >> specified. >> [ensembl|vcf|pileup] Converted output is written to the file >> specified in >> --output_file. No consequence calculation is >> carried out when >> doing file conversion. [default: off] >> >> --refseq Use the otherfeatures database to retrieve >> transcripts - this >> database contains RefSeq transcripts (as well as >> CCDS and >> Ensembl EST alignments) [default: off] >> --host Manually define database host [default: >> "ensembldb.ensembl.org <http: ensembldb.ensembl.org="">"] >> -u | --user Database username [default: "anonymous"] >> --port Database port [default: 5306] >> --password Database password [default: no password] >> --genomes Sets DB connection params for Ensembl Genomes >> [default: off] >> --registry Registry file to use defines DB connections >> [default: off] >> Defining a registry file overrides above >> connection settings. >> --db_version=[number] Force script to load DBs from a specific Ensembl >> version. Not >> advised due to likely incompatibilities between >> API and DB >> >> --no_whole_genome Run in old-style, non-whole genome mode [default: >> off] >> --buffer_size Sets the number of variants sent in each batch >> [default: 5000] >> Increasing buffer size can retrieve results more >> quickly >> but requires more memory. Only applies to whole >> genome mode. >> --cache Enables read-only use of cache [default: off] >> --dir [directory] Specify the base cache directory to use [default: >> "$HOME/.vep/"] >> --write_cache Enable writing to cache [default: off] >> --build [all|list] Build a complete cache for the selected species. >> Build for all >> chromosomes with --build all, or a list of >> chromosomes (see >> --chr). DO NOT USE WHEN CONNECTED TO PUBLIC DB >> SERVERS AS THIS >> VIOLATES OUR FAIR USAGE POLICY [default: off] >> --compress Specify utility to decompress cache files - may >> be "gzcat" or >> "gzip -dc" Only use if default does not work >> [default: zcat] >> --skip_db_check ADVANCED! Force the script to use a cache built >> from a different >> database than specified with --host. Only use >> this if you are >> sure the hosts are compatible (e.g. >> ensembldb.ensembl.org <http: ensembldb.ensembl.org=""> and >> useastdb.ensembl.org <http: useastdb.ensembl.org="">) [default: off] >> --cache_region_size ADVANCED! The size in base-pairs of the region >> covered by one >> file in the cache. [default: 1MB] >> >> Also, please remind me of (point me to) the plug-in you're using so >> I can test that. >> >> >> We are using a single plugin that returns the Condel scores. The *Condel >> plugin* can be found on github here: >> https://github.com/ensembl-variation/VEP_plugins >> >> Again, thanks a lot for your support. Please let me know if there is >> anything I can do to help, e.g. with testing the package. >> >> Best, >> Thomas >> > > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6