I'm using the predictCoding function in the VariantAnnotation package on a VCF file with multiple samples. predictCoding tells me that there are many variations at the same location, but no way to tell which variation occurred in which sample. It is possible to retrieve this information?
You'll need to look at the GT data to determine which samples have the variant. The genotypesToSnpMatrix() function converts the genotypes to a SnpMatrix object where rows are samples and columns are snps. See ?genotypeToSnpMatrix for details and information about the warnings.
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> mat <- genotypeToSnpMatrix(vcf)
Warning messages:
1: In .local(x, ...) : variants with >1 ALT allele are set to NA
2: In .local(x, ...) : non-single nucleotide variations are set to NA
> as(mat$genotype, "character")
rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001 "A/A" "A/A" "NA" "NA" "NA"
NA00002 "A/B" "A/B" "NA" "NA" "NA"
NA00003 "B/B" "A/A" "NA" "NA" "NA"
> as(mat$genotype, "matrix")
rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001 01 01 00 00 00
NA00002 02 02 00 00 00
NA00003 03 01 00 00 00
You probably know that predictCoding() returns results for coding variants only and if the variant falls in multiple transcripts there will be a row for each variant-transcript match. The QUERYID column in the output maps back to the row of the original query. Using this and the data from the SnpMatrix you can id which samples had a particular variant output by predictCoding().