predictCoding with multiple samples
1
0
Entering edit mode
luong.jeff • 0
@luongjeff-8689
Last seen 9.3 years ago
Canada

Hi BioC,

I'm using the predictCoding function in the VariantAnnotation package on a VCF file with multiple samples. predictCoding tells me that there are many variations at the same location, but no way to tell which variation occurred in which sample. It is possible to retrieve this information?

predict coding variantannotation • 1.1k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States

Hi,

You'll need to look at the GT data to determine which samples have the variant. The genotypesToSnpMatrix() function converts the genotypes to a SnpMatrix object where rows are samples and columns are snps. See ?genotypeToSnpMatrix for details and information about the warnings.

> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 
> vcf <- readVcf(fl, "hg19")
> mat <- genotypeToSnpMatrix(vcf)
Warning messages:
1: In .local(x, ...) : variants with >1 ALT allele are set to NA
2: In .local(x, ...) : non-single nucleotide variations are set to NA

> as(mat$genotype, "character")
        rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001 "A/A"     "A/A"        "NA"      "NA"           "NA"     
NA00002 "A/B"     "A/B"        "NA"      "NA"           "NA"     
NA00003 "B/B"     "A/A"        "NA"      "NA"           "NA"     
> as(mat$genotype, "matrix")
        rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001        01           01        00             00        00
NA00002        02           02        00             00        00
NA00003        03           01        00             00        00

 

You probably know that predictCoding() returns results for coding variants only and if the variant falls in multiple transcripts there will be a row for each variant-transcript match. The QUERYID column in the output maps back to the row of the original query. Using this and the data from the SnpMatrix you can id which samples had a particular variant output by predictCoding().

Valerie

ADD COMMENT
0
Entering edit mode
Hi Valerie, I’m still having difficulties interpreting my GT data using the genotypeToSnpMatrix function. I have some sample data below to illustrate my situation. The predictCoding function output suggests that there are 4 coding variants at row 4116 of the original query. But the SnpMatrix data suggests that there are many more than 4 samples with at least one risk allele (“B”). How would I interpret this data? How would I find out which 4 out of my 209 samples are being represented in the predictCoding output? query_id gene cancer_type sample_id position wt_residue mut_residue 4116 CECR1 breastcancer 22:17669306_T/C 94 H R 4116 CECR1 breastcancer 22:17669306_T/C 335 H R 4116 CECR1 breastcancer 22:17669306_T/C 335 H R 4116 CECR1 breastcancer 22:17669306_T/C 293 H R SnpMatrix "D66001" "A/B" "D66002" "A/A" "D66003" "A/B" "D66004" "A/A" "D66005" "A/A" "D66006" "A/B" "D66007" "A/A" "D66008" "A/B" "D66009" "B/B" "D66010" "A/B" "D66011" "A/B" "D66012" "A/B" "D66013" "A/A" "D66014" "A/A" "D66015" "A/A" "D66016" "A/B" "D66017" "A/B" "D66018" "A/B" "D66019" "A/A" "D66020" "B/B" "D66021" "A/A" "D66022" "A/A" "D66023" "A/B" "D66024" "A/A" "D66025" "A/A" "D66026" "A/A" "D66027" "A/B" "D66028" "A/B" "D66029" "A/A" "D66030" "B/B" "D66031" "A/B" "D66032" "A/A" "D66033" "A/B" "D66034" "A/A" "D66035" "A/B" "D66036" "B/B" "D66037" "B/B" "D66038" "A/A" "D66039" "A/A" "D66040" "A/B" "D66041" "A/B" "D66042" "A/B" "D66043" "A/A" "D66044" "A/A" "D66045" "A/A" "D66046" "B/B" "D66047" "A/B" "D66048" "A/A" "D66049" "A/B" "D66050" "A/B" "D66051" "A/B" "D66052" "A/B" "D66053" "A/B" "D66054" "A/B" "D66055" "A/A" "D66056" "A/A" "D66057" "A/B" "D66058" "A/A" "D66059" "A/B" "D66060" "A/B" "D66061" "A/A" "D66062" "A/B" "D66063" "A/A" "D66064" "A/A" "D66065" "A/A" "D66066" "A/A" "D66067" "A/B" "D66068" "B/B" "D66069" "A/A" "D66070" "A/A" "D66071" "A/A" "D66072" "A/A" "D66073" "A/B" "D66074" "B/B" "D66075" "A/B" "D66076" "A/A" "D66077" "A/A" "D66078" "A/A" "D66079" "A/A" "D66080" "A/A" "D66081" "A/A" "D66082" "A/A" "D66083" "A/A" "D66084" "B/B" "D66085" "A/A" "D66086" "A/A" "D66087" "A/B" "D66088" "A/A" "D66089" "A/B" "D66090" "A/B" "D66091" "A/A" "D66092" "A/A" "D66093" "A/B" "D66094" "A/A" "D66095" "B/B" "D66096" "A/A" "D66097" "A/A" "D66098" "A/B" "D66099" "A/B" "D66100" "A/B" "D66101" "A/B" "D66102" "A/B" "D66103" "A/A" "D66104" "A/A" "D66200" "A/A" "D66201" "A/B" "D66202" "A/B" "D66203" "A/A" "D66204" "A/B" "D66205" "A/B" "D66206" "A/B" "D66207" "A/A" "D66208" "A/B" "D66209" "A/B" "D66210" "A/B" "D66211" "A/B" "D66212" "B/B" "D66213" "A/A" "D66214" "A/A" "D66215" "A/A" "D66216" "A/A" "D66217" "A/A" "D66218" "A/A" "D66219" "A/A" "D66220" "A/A" "D66221" "B/B" "D66222" "A/B" "D66223" "A/B" "D66224" "A/B" "D66225" "A/B" "D66226" "A/B" "D66227" "A/A" "D66228" "A/B" "D66229" "A/B" "D66230" "A/B" "D66231" "A/A" "D66232" "A/A" "D66233" "A/A" "D66234" "A/B" "D66235" "A/A" "D66236" "A/B" "D66237" "A/A" "D66238" "A/B" "D66239" "A/B" "D66240" "A/B" "D66241" "A/A" "D66242" "A/B" "D66243" "A/A" "D66244" "A/B" "D66245" "A/B" "D66246" "A/B" "D66247" "A/A" "D66248" "B/B" "D66249" "A/A" "D66250" "A/A" "D66251" "A/A" "D66252" "B/B" "D66253" "A/A" "D66254" "A/B" "D66255" "B/B" "D66256" "A/B" "D66257" "A/B" "D66258" "A/A" "D66259" "A/A" "D66260" "A/A" "D66261" "A/B" "D66262" "A/B" "D66263" "A/A" "D66264" "A/A" "D66265" "A/A" "D66266" "A/A" "D66267" "A/A" "D66268" "A/B" "D66269" "A/A" "D66270" "A/B" "D66271" "A/A" "D66272" "A/B" "D66273" "A/A" "D66274" "A/A" "D66275" "B/B" "D66276" "A/A" "D66277" "A/A" "D66278" "A/A" "D66279" "A/B" "D66280" "A/B" "D66281" "A/B" "D66282" "A/B" "D66283" "A/A" "D66284" "A/B" "D66285" "B/B" "D66286" "A/B" "D66287" "A/A" "D66288" "A/B" "D66289" "A/A" "D66290" "A/A" "D66291" "A/B" "D66292" "A/A" "D66293" "A/A" "D66294" "A/A" "D66295" "A/A" "D66296" "A/B" "D66297" "A/B" "D66298" "A/B" "D66299" "A/A" "D66300" "B/B" "D66301" "A/B" "D66302" "A/A" "D66303" "A/A" "D66304" "A/B" Thanks, Jeff > On Aug 26, 2015, at 3:41 PM, Valerie Obenchain [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org <https: support.bioconductor.org=""/> > User Valerie Obenchain <https: support.bioconductor.org="" u="" 4275=""/> wrote Answer: predictCoding with multiple samples <https: support.bioconductor.org="" p="" 71448="" #71516="">: > > > Hi, > > You'll need to look at the GT data to determine which samples have the variant. The genotypesToSnpMatrix() function converts the genotypes to a SnpMatrix object where rows are samples and columns are snps. See ?genotypeToSnpMatrix for details and information about the warnings. > > > fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") > > vcf <- readVcf(fl, "hg19") > > mat <- genotypeToSnpMatrix(vcf) > Warning messages: > 1: In .local(x, ...) : variants with >1 ALT allele are set to NA > 2: In .local(x, ...) : non-single nucleotide variations are set to NA > > > as(mat$genotype, "character") > rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1 > NA00001 "A/A" "A/A" "NA" "NA" "NA" > NA00002 "A/B" "A/B" "NA" "NA" "NA" > NA00003 "B/B" "A/A" "NA" "NA" "NA" > > as(mat$genotype, "matrix") > rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1 > NA00001 01 01 00 00 00 > NA00002 02 02 00 00 00 > NA00003 03 01 00 00 00 > > You probably know that predictCoding() returns results for coding variants only and if the variant falls in multiple transcripts there will be a row for each variant-transcript match. The QUERYID column in the output maps back to the row of the original query. Using this and the data from the SnpMatrix you can id which samples had a particular variant output by predictCoding(). > > Valerie > > > Post tags: predict coding, variantannotation > > You may reply via email or visit A: predictCoding with multiple samples >
ADD REPLY

Login before adding your answer.

Traffic: 487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6