Entering edit mode
I would like to export the genotypes (GT field) from a VCF file to allele dosage (0, 1 or 2 copies of a given allele for bi-allelic variants). As the VCF file can be large, I need to do it by batch:
gdose.con <- file("gdose.txt", open="a") amap.con <- file("amap.txt", open="a") cat("snp.names\tallele.1\tallele.2\tignore\n", file=amap.con, append=TRUE) tabix.file <- TabixFile(file="variants.vcf.gz", yieldSize=10^3) open(tabix.file) while(nrow(vcf <- readVcf(file=tabix.file, genome="test"))){ res <- genotypeToSnpMatrix(vcf) write.SnpMatrix(x=res$genotypes, file=gdose.con, append=TRUE, quote=FALSE, sep="\t", row.names=TRUE, col.names=FALSE) write.table(x=res$map, file=amap.con, append=TRUE, quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE) } close(tabix.file) close(gdose.con) close(amap.con)
The output "gdose" file is fine. However, the output "amap" file contains <S4 object of class "DNAStringSet">
in its 2nd column, even if I use as.data.frame
. What should I do to have instead the alternate alleles?