Troubleshooting a histogram DataTrack in Gviz
0
0
Entering edit mode
Russ Fraser ▴ 40
@russ-fraser-13646
Last seen 6.3 years ago
Ontario Veterinary College, University …

Hello all,

I would like to add a track to my Gviz plot that displays the frequency of variants in a specified region. E.g., I have the structure of Gene A, including exons and introns, and I want to quickly visualize how many variants are present in (arbitrarily defined) 100 bp bins along this gene. Hopefully this will highlight variable regions of gene A, or, conversely, illustrate that variation is pretty similar across the gene.

Progress so far: using VariantAnnotation, I created a DataTrack out of a VCF file as follows:

vcf <- readVcf("my_variants.vcf", genome = genome)
vcf_track <- DataTrack(rowRanges(vcf), name = "Variants", type = "histogram", window = 100)

I then plot the tracks:

plotTracks(c(itrack, bmt, dtrack, vcf_track, strack), from = start_loc, to = end_loc)

Everything looks great except the vcf_track. Instead of plotting the number of entries in the 100 bp window, it instead plots the values from the "QUAL" metadata column - or, more specifically, the mean of the QUAL values for SNPs in the 100 bp window (i.e. the default value of aggregation). What I really want is the number of QUAL values in that window - I don't care what they are - but I am having trouble finding the appropriate function to pass to aggregation. Any suggestions? Or have I totally misinterpreted how Gviz is constructing this histogram?

Thanks,

Russ

VariantAnnotation gviz • 1.6k views
ADD COMMENT
0
Entering edit mode

Although I couldn't find the answer to my question, I was able to perform a workaround.

In short, I imported the VCF as a dataframe into R, keeping only chromosome and position info (cols 1 and 2). I then used the hist function to create breaks and counts of my data. This info can be accessed as follows:

hist_info <- hist(data)
hist_info
hist_info$breaks
hist_info$counts

With a bit of massaging, I was able to construct a GRange object using the breaks as start/end positions, and assigning the hist$count as an mcol. Perhaps not the most elegant solution, but got the job done.

The end product:

ADD REPLY

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6