Hi, I am trying to use VariantAnnotation::readVcf to read in a VCF file in chunks. However the resulting VCF file does not have any variants:
Code should be placed in three backticks as shown below
> fl<-"mac3.recode.vcf.bgz"
> tab <- VcfFile(fl, yieldSize=4000)
> open(tab)
> while (nrow(vcf_yield <- readVcf(tab, "Salvelinus")))
cat("vcf dim:", dim(vcf_yield), "\n")
> close(tab)
> dim(vcf_yield)
[1] 0 138
> geno(vcf_yield)
List of length 11
names(11): GT AD DP GQ MIN_DP PGT PID PL PS RGQ SB
> sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.34.0 Rsamtools_2.4.0 Biostrings_2.56.0
[4] XVector_0.28.0 SummarizedExperiment_1.18.2 DelayedArray_0.14.1
[7] matrixStats_0.56.0 GenomicFeatures_1.40.1 AnnotationDbi_1.50.3
[10] Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
[13] IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] progress_1.2.2 tidyselect_1.1.0 purrr_0.3.4 lattice_0.20-41
[5] vctrs_0.3.6 generics_0.1.0 BiocFileCache_1.12.1 rtracklayer_1.48.0
[9] yaml_2.2.1 blob_1.2.1 XML_3.99-0.5 rlang_0.4.10
[13] pillar_1.4.6 glue_1.4.1 DBI_1.1.0 BiocParallel_1.22.0
[17] rappdirs_0.3.1 bit64_4.0.5 dbplyr_2.1.0 GenomeInfoDbData_1.2.3
[21] lifecycle_0.2.0 stringr_1.4.0 zlibbioc_1.34.0 memoise_1.1.0
[25] biomaRt_2.44.4 curl_4.3 Rcpp_1.0.5 BSgenome_1.56.0
[29] openssl_1.4.2 bit_4.0.4 hms_0.5.3 askpass_1.1
[33] digest_0.6.27 stringi_1.4.6 dplyr_1.0.4 grid_4.0.2
[37] tools_4.0.2 bitops_1.0-6 magrittr_1.5 RCurl_1.98-1.2
[41] RSQLite_2.2.3 tibble_3.0.3 crayon_1.3.4 pkgconfig_2.0.3
[45] Matrix_1.2-18 ellipsis_0.3.1 xml2_1.3.2 prettyunits_1.1.1
[49] assertthat_0.2.1 httr_1.4.2 rstudioapi_0.11 R6_2.4.1
[53] GenomicAlignments_1.24.0 compiler_4.0.2
My VCF input is an output from GATK's VariantFiltration, which was then put to VCFtools for a series of filtering.
Any idea why this would happen? Thank you!
Hi again. It seems it is a
yieldSize
problem. I tried with the example dataset and it shows the same error:If I don't iterate through the VCF with
yieldSize
, the function works well:My VCF file is too large to be imported as a whole (requires too much memory). Any ideas on how I can get around with this issue? Thanks!