Dear maintainer,
I notice a slightly strange behaviour of readVcf when ScanVcfParam(info=)
.
It seems that the imported VCF object somehow retains memories of the original VCF header (in the file), while the data was effectively dropped.
After import, the show
method seems fine (header and INFO state that only the desired field was imported).
However, looking directly at info(header(vcf))
shows that the original fields are still there, while the info(vcf)
show that only the desired field was imported. The show
method must hide the extra fields based on the content of info(vcf), I guess. I have dug as far as the .showVCFSubclass
method, but that's where it goes a little too decipher to read for me.
Many thanks,
Kevin
Let us take the example (from the vignette, plus a ScanVcfParam statement):
> svp <- ScanVcfParam(info = "NS") > fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") > vcf2 <- readVcf(fl, "hg19", param = svp)
# Object seems fine (see info(vcf) and info(header(vcf)) > vcf2 class: CollapsedVCF dim: 5 3 rowRanges(vcf): GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER info(vcf): DataFrame with 1 column: NS info(header(vcf)): Number Type Description NS 1 Integer Number of Samples With Data geno(vcf): SimpleList of length 4: GT, GQ, DP, HQ geno(header(vcf)): Number Type Description GT 1 String Genotype GQ 1 Integer Genotype Quality DP 1 Integer Read Depth HQ 2 Integer Haplotype Quality # Headers of non-imported fields are still present: needs fix (?) > info(header(vcf2)) DataFrame with 6 rows and 3 columns Number Type Description <character> <character> <character> NS 1 Integer Number of Samples With Data DP 1 Integer Total Depth AF A Float Allele Frequency AA 1 String Ancestral Allele DB 0 Flag dbSNP membership, build 129 H2 0 Flag HapMap2 membership # Only the desired data was imported: good > info(vcf2) DataFrame with 5 rows and 1 column NS <integer> rs6054257 3 20:17330_T/A 3 rs6040355 2 20:1230237_T/. 3 microsat1 3
Thank you for noting this discrepancy. The current implementation of displaying the original header information rather than the subset is deliberate in design. We felt that knowing all the possible information available from the object could be beneficial. Also some fields are closely related and it might be useful to know one or the other existed even if it was dropped.
Thank you. I suspected so, and it makes sense. In my use case however, I prefer to keep header and data synchronised, for a few reasons:
^_^
)Should you post this reply as an answer, if there is no plan to change
VariantAnnotation
? (I really don't mean this in a bad way, as I wrote above, it makes sense).For my use case, I simply wrote an extra line of code to remove from info(header) the rows without data in info(). Not a big deal.
Thanks again for the clarification!
Kevin