Is there a simple way to get comments from a VCF header of a VCF object? For example, consider the following excerpt of a VCF file:
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569>
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##filtering_status=These calls have been filtered by FilterMutectCalls to label false positives with a list of failed filters and true positives with PASS.
##normal_sample=OSCC_12-N
##source=FilterMutectCalls
##source=Mutect2
##tumor_sample=OSCC_12-CC_P
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT OSCC_12-CC_P OSCC_12-N
I can get a part of the header such as the contigs list by doing:
> class(variants)
[1] "CollapsedVCF"
attr(,"package")
[1] "VariantAnnotation"
header(header(variants))[["contig"]]
But, I'd like to get the comments to easily find out what's the ID corresponding to tumor_sample
. It turns out that MuTect2 outputs the genotype information in alphabetical order, so, sometimes (e.g OSCC12-CCP, cell culture of primary vs. OSCC12-N, the normal), the first column contains the tumour and in other cases (e.g. OSCC1-N, the normal vs. OSCC-1-P, the primary), the second genotype column does. Positional matching will lead to mistakes and the only robust way to identify which column has the cancer sample is to extract the tumour ID from the VCF header comments and do geno(variants)$AD[, cancerID]
.
That solution addresses my question.