When loading a VCF file, one of the records is ignored. The missing record has a 4094 byte record immediately prior to it and any change to the length of that record makes the loading succeed again. Changing the content of the prior record without changing the length also causes the record to fail to load.
Reproduction steps:
names(readVcf("repo.vcf"))
Expected output:
[1] "gridss9_272646b" "gridss9_31852o"
Actual output:
[1] "gridss9_272646b"
The offending file can be found at https://pastebin.com/qJ0afBAP
> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.28.3 Rsamtools_1.34.0 Biostrings_2.50.1 XVector_0.22.0 SummarizedExperiment_1.12.0
[6] DelayedArray_0.8.0 BiocParallel_1.16.2 matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0
[11] GenomeInfoDb_1.18.1 IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 compiler_3.5.1 prettyunits_1.0.2 GenomicFeatures_1.34.1 bitops_1.0-6 tools_3.5.1
[7] zlibbioc_1.28.0 progress_1.2.0 biomaRt_2.38.0 digest_0.6.18 bit_1.1-14 BSgenome_1.50.0
[13] RSQLite_2.1.1 memoise_1.1.0 lattice_0.20-38 pkgconfig_2.0.2 rlang_0.3.0.1 Matrix_1.2-15
[19] DBI_1.0.0 rstudioapi_0.8 yaml_2.2.0 GenomeInfoDbData_1.2.0 rtracklayer_1.42.1 httr_1.3.1
[25] stringr_1.3.1 hms_0.4.2 bit64_0.9-7 grid_3.5.1 R6_2.3.0 AnnotationDbi_1.44.0
[31] XML_3.98-1.16 magrittr_1.5 blob_1.1.1 GenomicAlignments_1.18.0 assertthat_0.2.0 stringi_1.2.4
[37] RCurl_1.95-4.11 crayon_1.3.4
This conversation has moved to the github repo: https://github.com/Bioconductor/VariantAnnotation/issues/19