Hello all,
I would like to add a track to my Gviz plot that displays the frequency of variants in a specified region. E.g., I have the structure of Gene A, including exons and introns, and I want to quickly visualize how many variants are present in (arbitrarily defined) 100 bp bins along this gene. Hopefully this will highlight variable regions of gene A, or, conversely, illustrate that variation is pretty similar across the gene.
Progress so far: using VariantAnnotation, I created a DataTrack out of a VCF file as follows:
vcf <- readVcf("my_variants.vcf", genome = genome)
vcf_track <- DataTrack(rowRanges(vcf), name = "Variants", type = "histogram", window = 100)
I then plot the tracks:
plotTracks(c(itrack, bmt, dtrack, vcf_track, strack), from = start_loc, to = end_loc)
Everything looks great except the vcf_track. Instead of plotting the number of entries in the 100 bp window, it instead plots the values from the "QUAL" metadata column - or, more specifically, the mean of the QUAL values for SNPs in the 100 bp window (i.e. the default value of aggregation
). What I really want is the number of QUAL values in that window - I don't care what they are - but I am having trouble finding the appropriate function to pass to aggregation
. Any suggestions? Or have I totally misinterpreted how Gviz is constructing this histogram?
Thanks,
Russ
Although I couldn't find the answer to my question, I was able to perform a workaround.
In short, I imported the VCF as a dataframe into R, keeping only chromosome and position info (cols 1 and 2). I then used the
hist
function to create breaks and counts of my data. This info can be accessed as follows:With a bit of massaging, I was able to construct a
GRange
object using the breaks as start/end positions, and assigning thehist$count
as an mcol. Perhaps not the most elegant solution, but got the job done.The end product: