readVcf skipping CS field in INFO
1
0
Entering edit mode
Peter Hickey ▴ 740
@petehaitch
Last seen 2 days ago
WEHI, Melbourne, Australia

I have a VCF created by Bis-SNP. One of the INFO fields is called CS but readVcf is ignoring it and I can't figure out why. A tiny example VCF can be downloaded from https://www.dropbox.com/sh/bmbyjgts26req8k/AAA4UkTTUU8IzNxkcy4ZP14Ua?dl=0 that can be used to reproduce the problem.

This is what I've tried:

> library(VariantAnnotation)
> x <- readVcf('~/tmp/ex.vcf.gz', 'hg19')

# Missing CS field. Should appear before Context.
> info(x)
DataFrame with 2 rows and 9 columns
                       Context        DB        DP        HQ       MQ0        NS        QD             REF        SB
               <CharacterList> <logical> <integer> <numeric> <integer> <integer> <numeric> <CharacterList> <numeric>
rs55998931                  YH      TRUE         8        NA         0         1        NA              CH   -0.1412
chr1:10774_G/C              SG     FALSE         4        NA         0         1        NA              CH   -0.6254

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] VariantAnnotation_1.12.2 Rsamtools_1.18.1         Biostrings_2.34.0        XVector_0.6.0            GenomicRanges_1.18.1    
[6] GenomeInfoDb_1.2.2       IRanges_2.0.0            S4Vectors_0.4.0          BiocGenerics_0.12.0     

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.1    base64enc_0.1-2         BatchJobs_1.4           BBmisc_1.7              Biobase_2.26.0         
 [6] BiocParallel_1.0.0      biomaRt_2.22.0          bitops_1.0-6            brew_1.0-6              BSgenome_1.34.0        
[11] checkmate_1.5.0         codetools_0.2-9         DBI_0.3.1               digest_0.6.4            fail_1.2               
[16] foreach_1.4.2           GenomicAlignments_1.2.0 GenomicFeatures_1.18.2  iterators_1.0.7         RCurl_1.95-4.3         
[21] RSQLite_1.0.0           rtracklayer_1.26.1      sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[26] XML_3.98-1.1            zlibbioc_1.12.0   

Thanks for your help,

Pete

 
variantannotation readVcf • 1.3k views
ADD COMMENT
0
Entering edit mode

Thanks Pete. I'll have a look.

Valerie

ADD REPLY
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.5 years ago
United States

Now fixed in release (1.12.3) and devel (1.13.5).

 
The problem was  'Type=Character' in the header was not supported. When we first wrote the reader the VCF specs were less formalized and the (vast majority of) files had Type 'String', not 'Character'. The specs are now more clear and readVcf() supports all Types (Integer, Flag, Float, Character, String).
 
Thanks for reporting this.
 
Valerie
ADD COMMENT
0
Entering edit mode

Thanks for the finding and fixing this so quickly, Valerie.

ADD REPLY

Login before adding your answer.

Traffic: 1078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6