Hi all!
I'm trying to read a vcf file with readVcfAsVRanges but I get the following error:(readVcf works, but then I get the error when I want to convert it to an VRanges object)
a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions
I am using SomaticSignatures_2.6.1 and VariantAnnotation_1.16.4
This is the original file:
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 762273 . G A . PASS DP=223;SS=1;SSC=0;GPV=5.8167E-133;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:105:0:105:100%:0,0,7,98 1/1:.:118:0:117:100%:0,0,10,107
chr1 808922 . G A . PASS DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:12:0:12:100%:0,0,10,2 1/1:.:9:0:9:100%:0,0,8,1
Does anyone know what is happening? Can readVcfAsVRanges read vcf's with more than one sample? Any help would be appreciated!
Many thanks,
Maria
Maria, please note that you will need to update your Bioconductor installation in order to profit from this patch. You are very likely running an outdated and unsupported version of Bioc, judging from the version 1.16.4 of VariantAnnotation.
Could be also a problem with he Ubuntu version? I've just updated R and Bioconductor and installed VariantAnnotation 1.19.5 (should this also handle Varscan2?), and I'm getting the same error:
> a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
[3] LC_TIME=ca_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
[5] LC_MONETARY=ca_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
[7] LC_PAPER=ca_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] VariantAnnotation_1.19.5 Rsamtools_1.24.0
[3] Biostrings_2.40.2 XVector_0.12.0
[5] SummarizedExperiment_1.2.3 Biobase_2.32.0
[7] GenomicRanges_1.24.2 GenomeInfoDb_1.8.1
[9] IRanges_2.6.1 S4Vectors_0.10.1
[11] BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.4 GenomicAlignments_1.8.3 bitops_1.0-6
[4] DBI_0.4-1 RSQLite_1.0.0 GenomicFeatures_1.24.3
[7] zlibbioc_1.18.0 BiocParallel_1.6.2 tools_3.3.1
[10] BSgenome_1.40.1 biomaRt_2.28.0 RCurl_1.95-4.8
[13] rtracklayer_1.32.1 AnnotationDbi_1.34.3
No you have to wait for 1.19.7 to show up.
Also, your packages are a mixture of 'release' (even-numbered bioc) and 'devel';
unless the fix has been ported to the release branch, you'll have to follow http://bioconductor.org/developers/how-to/useDevel/ to make sure you are using the devel branch. Use BiocInstaller::biocValid() to validate that your packages are consistent.It was ported to 1.18.3 so once things are consistent release should work.
Many thanks!! It's working now with version 1.18.3
But I've another error:
a<-readVcfAsVRanges("myprova2.vcf",genome="hg19")
Error in if ((length(unique(elt)) != 1L) && (dim(AD)[3] != unique(elt))) { :
missing value where TRUE/FALSE needed
It's coming from this record (or similar ones: two alternative alleles)
chr1 808922 . G A,T . PASS DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:12:0:12:100%:0,0,10,2 2/2:.:9:0:9:100%:0,0,8,1
Does the package handle this cases?
There is a bug in the package that causes this error, but the root problem is that the AD field is defined in the header with Number=1, while it should be Number=A, if there is one value per alt allele. I also noticed that the Number for DP4 is wrong. It should be "." not 1.
With the upcoming VariantAnnotation 1.18.5, using this invalid file, only the counts for the first allele will be used, even for the second allele, which is obviously wrong. Ideally, the VCF parser would throw an error or warning if it encounters multiple values for a Number=1 field. Right now, it silently takes the first. Maybe one of the authors of the parser could consider that.