Entering edit mode
From the vcf file, the field description is:
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
An example of this field is AS_SB_TABLE=1067,529|5,196
. However, readVcf
seems to split by ,
and take only the first number (1067) - this is what I see in R:
> m2crct = readVcf('somatic_mutect2.filtered.vcf.gz')
> head(info(m2crct)$AS_SB_TABLE)
[1] "1067" "10" "56" "49" "37" "192"
I think it's a bug in readVcf
, not in GATK - in the field description it is stated that it should be a String and have one element (Number=1). So it should not be split by ,
. Comma can be part of a string, and number of items should be validated (one should expect one item in this case, which is the whole arbitrary string.)
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.7.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.44.0 Rsamtools_2.14.0 Biostrings_2.66.0 XVector_0.38.0
[5] SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[9] IRanges_2.32.0 S4Vectors_0.36.1 MatrixGenerics_1.10.0 matrixStats_0.63.0
[13] BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1 png_0.1-8
[5] assertthat_0.2.1 digest_0.6.31 utf8_1.2.2 BiocFileCache_2.6.0
[9] R6_2.5.1 RSQLite_2.2.20 httr_1.4.4 pillar_1.8.1
[13] zlibbioc_1.44.0 rlang_1.0.6 GenomicFeatures_1.50.4 progress_1.2.2
[17] curl_5.0.0 rstudioapi_0.14 blob_1.2.3 Matrix_1.5-1
[21] BiocParallel_1.32.5 stringr_1.5.0 RCurl_1.98-1.9 bit_4.0.5
[25] biomaRt_2.54.0 DelayedArray_0.24.0 rtracklayer_1.58.0 compiler_4.2.2
[29] pkgconfig_2.0.3 tidyselect_1.2.0 KEGGREST_1.38.0 tibble_3.1.8
[33] GenomeInfoDbData_1.2.9 codetools_0.2-18 XML_3.99-0.13 fansi_1.0.3
[37] crayon_1.5.2 dplyr_1.0.10 dbplyr_2.3.0 GenomicAlignments_1.34.0
[41] bitops_1.0-7 rappdirs_0.3.3 grid_4.2.2 lifecycle_1.0.3
[45] DBI_1.1.3 magrittr_2.0.3 cli_3.6.0 stringi_1.7.12
[49] cachem_1.0.6 xml2_1.3.3 ellipsis_0.3.2 filelock_1.0.2
[53] vctrs_0.5.1 generics_0.1.3 rjson_0.2.21 restfulr_0.0.15
[57] tools_4.2.2 bit64_4.0.5 BSgenome_1.66.2 glue_1.6.2
[61] hms_1.1.2 yaml_2.3.6 parallel_4.2.2 fastmap_1.1.0
[65] AnnotationDbi_1.60.0 memoise_2.0.1 BiocIO_1.8.0