Write the "map" DataFrame from genotypeToSnpMatrix() to a text file
1
0
Entering edit mode
@timotheeflutre-6727
Last seen 5.6 years ago
France

I would like to export the genotypes (GT field) from a VCF file to allele dosage (0, 1 or 2 copies of a given allele for bi-allelic variants). As the VCF file can be large, I need to do it by batch:

gdose.con <- file("gdose.txt", open="a")
amap.con <- file("amap.txt", open="a")
cat("snp.names\tallele.1\tallele.2\tignore\n", file=amap.con, append=TRUE)
tabix.file <- TabixFile(file="variants.vcf.gz", yieldSize=10^3)
open(tabix.file)
while(nrow(vcf <- readVcf(file=tabix.file, genome="test"))){
  res <- genotypeToSnpMatrix(vcf)
  write.SnpMatrix(x=res$genotypes, file=gdose.con, append=TRUE,
                  quote=FALSE, sep="\t", row.names=TRUE,
                  col.names=FALSE)
  write.table(x=res$map, file=amap.con, append=TRUE,
              quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
}
close(tabix.file)
close(gdose.con)
close(amap.con)

The output "gdose" file is fine. However, the output "amap" file contains <S4 object of class "DNAStringSet"> in its 2nd column, even if I use as.data.frame. What should I do to have instead the alternate alleles?

variantannotation DataFrame snpstats • 1.8k views
ADD COMMENT
1
Entering edit mode
@timotheeflutre-6727
Last seen 5.6 years ago
France

Instead of using genotypeToSnpMatrix(vcf), I used VariantAnnotation::geno(vcf) and GenomicRanges::rowRanges(vcf). This answer from Martin Morgan was useful.

ADD COMMENT

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6