Subsetting the geno fields in a VCF using VariantAnnotation
1
0
Entering edit mode
rubi ▴ 110
@rubi-6462
Last seen 6.3 years ago

Hi,

 

I'm trying to write a VCF file using the VariantAnnotation package.

 

Some of my sites are physically phased and therefore have the GATK PGT and PID FORMAT fields (see VCF comment records below):

##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">

 

For downstream analyses I need that only the VCF records which are physically phased to have these FORMAT fields but all other VCF records no to have that.

 

I can't seem to be able to set this using the geno(out.vcf)$PGT and geno(out.vcf)$PID commands - they seem only to be able to assign these fields to either all records in the VCF or none.

Any attempt to subset these gives the error:

Error in geno(out.vcf)$PGT[idx, 1] = NULL :

  number of items to replace is not a multiple of replacement length

 

 

Help would be appreciated.

VariantAnnotation • 1.5k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

You need a value for every cell in the PGT and PID columns. Just use NA to not output a value in the VCF.

ADD COMMENT

Login before adding your answer.

Traffic: 604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6