Entering edit mode
georgewwp
•
0
@georgewwp-9719
Last seen 3.8 years ago
Hi there,
I have a list of variants (in vcf format) called across two samples. I want to use multiple criteria to select a subset of these variants.
For example, I want to select "G" to "A" or "C" to "T" changes, since I'm only interested in these two specific type of SNPs.
Also, I want these SNPs have certain GT call combinations in the two samples: "0/1" GT call for sample 1 and "1/1" GT call for sample 2; or cannot be "1/1" for both samples at the same time.
What's the best way to achieve this? I had difficulty combining these criteria.
Thanks!
George
class: CollapsedVCF dim: 309482 2 rowRanges(vcf): GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER info(vcf): DataFrame with 17 columns: INDEL, IDV, IMF, DP, VDB, RPB, MQB, BQB, MQSB, SGB, MQ0F, ICB, HOB, AC, AN, DP4, MQ info(header(vcf)): Number Type Description INDEL 0 Flag Indicates that the variant is an INDEL. IDV 1 Integer Maximum number of reads supporting an indel IMF 1 Float Maximum fraction of reads supporting an indel DP 1 Integer Raw read depth VDB 1 Float Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better),Version RPB 1 Float Mann-Whitney U test of Read Position Bias (bigger is better) MQB 1 Float Mann-Whitney U test of Mapping Quality Bias (bigger is better) BQB 1 Float Mann-Whitney U test of Base Quality Bias (bigger is better) MQSB 1 Float Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better) SGB 1 Float Segregation based metric. MQ0F 1 Float Fraction of MQ0 reads (smaller is better) ICB 1 Float Inbreeding Coefficient Binomial test (bigger is better) HOB 1 Float Bias in the number of HOMs number (smaller is better) AC A Integer Allele count in genotypes for each ALT allele, in the same order as listed AN 1 Integer Total number of alleles in called genotypes DP4 4 Integer Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases MQ 1 Integer Average mapping quality geno(vcf): SimpleList of length 2: GT, PL geno(header(vcf)): Number Type Description GT 1 String Genotype PL G Integer List of Phred-scaled genotype likelihoods
It would be helpful to know what you tried and where it failed. The first step would be to
expand()
the VCF so that the variants can be selected on a per-alt basis.