Cannot use `readVcf()` on a file created with `writeVcf()`
0
0
Entering edit mode
@steve-pederson-23427
Last seen 8 months ago
Australia

Hi,

I've done a fair bit of filtering on a VCF in R then exported it. The basic object looks like the below

library(VariantAnnotation)
> vcf_ranges
GRanges object with 2286874 ranges and 2 metadata columns:
                     seqnames        ranges strand |            REF
                        <Rle>     <IRanges>  <Rle> | <DNAStringSet>
    chr1:82133:CAA:C     chr1   82133-82135      * |            CAA
   chr1:189392:ACC:A     chr1 189392-189394      * |            ACC
     chr1:266093:C:G     chr1        266093      * |              C
     chr1:266225:C:A     chr1        266225      * |              C
     chr1:271618:C:T     chr1        271618      * |              C
                 ...      ...           ...    ... .            ...
  chrX:156022319_T/C     chrX     156022319      * |              T
  chrX:156023286_G/A     chrX     156023286      * |              G
  chrX:156023496_C/T     chrX     156023496      * |              C
  chrX:156024517_C/T     chrX     156024517      * |              C
  chrX:156025910_C/T     chrX     156025910      * |              C
                                    ALT
                     <DNAStringSetList>
    chr1:82133:CAA:C                  C
   chr1:189392:ACC:A                  A
     chr1:266093:C:G                  G
     chr1:266225:C:A                  A
     chr1:271618:C:T                  T
                 ...                ...
  chrX:156022319_T/C                  C
  chrX:156023286_G/A                  A
  chrX:156023496_C/T                  T
  chrX:156024517_C/T                  T
  chrX:156025910_C/T                  T
  -------
  seqinfo: 193 sequences from an unspecified genome

When I export this, again everything looks OK

writeVcf(
    VCF(granges(vcf_ranges), fixed = mcols(vcf_ranges)),
    "new.vcf", index = TRUE
)

However, I can't read the file in again

vcfParam <- ScanVcfParam(fixed = c("REF", "ALT"))
vcf <- readVcf("new.vcf.bgz", param = vcfParam)
[E::COMPAT_bcf_hdr_read] Input is not detected as bcf or vcf format
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'seqinfo': no 'header' line "#CHROM POS ID..."?

Checking the file on disk I can see the apparently missing header line

zcat new.vcf.bgz | head -200 | tail -n10
##contig=<ID=KI270754.1,length=40191>
##contig=<ID=KI270755.1,length=36723>
##contig=<ID=KI270756.1,length=79590>
##contig=<ID=KI270757.1,length=71251>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    82133   chr1:82133:CAA:C        CAA     C       .       .       .
chr1    189392  chr1:189392:ACC:A       ACC     A       .       .       .
chr1    266093  chr1:266093:C:G C       G       .       .       .
chr1    266225  chr1:266225:C:A C       A       .       .       .
chr1    271618  chr1:271618:C:T C       T       .       .       .

Has anyone else come across this & is there a workaround? Or have I missed something obvious? I'm not a VCF regular so may have made a simple error.

I'm also working in a conda env on an HPC if that matters.

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /data/tki_bodl/sw/micromamba/envs/transmogR/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Perth
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] VariantAnnotation_1.48.1    Rsamtools_2.18.0           
 [3] Biostrings_2.70.1           XVector_0.42.0             
 [5] SummarizedExperiment_1.32.0 Biobase_2.62.0             
 [7] GenomicRanges_1.54.1        GenomeInfoDb_1.38.1        
 [9] IRanges_2.36.0              S4Vectors_0.40.2           
[11] MatrixGenerics_1.14.0       matrixStats_1.2.0          
[13] BiocGenerics_0.48.1        

loaded via a namespace (and not attached):
 [1] KEGGREST_1.42.0          rjson_0.2.21             lattice_0.22-5          
 [4] vctrs_0.6.5              tools_4.3.2              bitops_1.0-7            
 [7] generics_0.1.3           curl_5.1.0               parallel_4.3.2          
[10] tibble_3.2.1             fansi_1.0.6              AnnotationDbi_1.64.1    
[13] RSQLite_2.3.4            blob_1.2.4               pkgconfig_2.0.3         
[16] Matrix_1.6-5             BSgenome_1.70.1          dbplyr_2.4.0            
[19] lifecycle_1.0.4          GenomeInfoDbData_1.2.11  compiler_4.3.2          
[22] stringr_1.5.1            progress_1.2.3           codetools_0.2-19        
[25] yaml_2.3.8               RCurl_1.98-1.14          pillar_1.9.0            
[28] crayon_1.5.2             BiocParallel_1.36.0      DelayedArray_0.28.0     
[31] cachem_1.0.8             abind_1.4-5              tidyselect_1.2.0        
[34] digest_0.6.34            stringi_1.8.3            restfulr_0.0.15         
[37] dplyr_1.1.4              biomaRt_2.58.0           rprojroot_2.0.4         
[40] fastmap_1.1.1            grid_4.3.2               here_1.0.1              
[43] cli_3.6.2                SparseArray_1.2.2        magrittr_2.0.3          
[46] S4Arrays_1.2.0           GenomicFeatures_1.54.1   XML_3.99-0.16           
[49] utf8_1.2.4               rappdirs_0.3.3           filelock_1.0.3          
[52] prettyunits_1.2.0        bit64_4.0.5              httr_1.4.7              
[55] bit_4.0.5                png_0.1-8                hms_1.1.3               
[58] memoise_2.0.1            BiocIO_1.12.0            BiocFileCache_2.10.1    
[61] rtracklayer_1.62.0       rlang_1.1.3              glue_1.7.0              
[64] DBI_1.2.0                xml2_1.3.6               R6_2.5.1                
[67] GenomicAlignments_1.38.0 zlibbioc_1.48.0
VariantAnnotation • 340 views
ADD COMMENT

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6