This question comes from a quiz exercise of the Coursera Bioconductor course. Since over there the class has been totally abandoned from the start by the tutors/mentors (supposing there are any) and apparently Mr Kasper Hansen has other priorities than checking the discussion board and helping his students, I have no other option than to post here.
I have a GRanges of H3K27me3 histone modification for the chr22 of H1 cell line that looks like this
> H1.chr22
GRanges object with 4068 ranges and 6 metadata columns: seqnames ranges strand | name score signalValue <Rle> <IRanges> <Rle> | <character> <numeric> <numeric> [1] chr22 [16554493, 16554606] * | Rank_121761 23 2.36555 [2] chr22 [16867445, 16867558] * | Rank_98479 35 3.13762 [3] chr22 [17270830, 17270985] * | Rank_98480 35 3.13762 [4] chr22 [17275375, 17275501] * | Rank_78285 43 3.55442 [5] chr22 [17325983, 17326150] * | Rank_84671 39 3.22958
I have retrieved a vector of fold-change enrichment of ChIP signal for the same chr and cell line, that looks like this:
> fc_signal GRanges object with 341236 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> [1] chr22 [16554493, 16554502] * | 3.24963998794556 [2] chr22 [16554503, 16554530] * | 4.06205987930298 [3] chr22 [16554531, 16554542] * | 3.24963998794556 [4] chr22 [16554543, 16554544] * | 2.04167008399963 [5] chr22 [16554545, 16554565] * | 1.53125
As you can see the fc_signal ranges are much shorter (in width) but altogether they cover the same intervals as H1.chr22. Indeed their difference is zero
> setdiff(fc_signal, H1.chr22) GRanges object with 0 ranges and 0 metadata columns:
I have to compute the average score in fc.signal for each region in the H1.chr22 GRanges. It was suggest to compute this using “Views”.
In other words, I need to find a way to "group" o "map" the fc_signal short ranges on the bigger ones, and getting the corresponding average value of their score. I have tried to do both subsetbyOverlaps or intersect, but other than the fact that I just get back the same query GRanges, I also loose the metadata values, so no chance to average them. I also don't see how could I create a Views with these two GRanges. Should I map them on the chr22 sequence (DNAString)?
I hope the question is clear
THANK YOU Sir, that's exactly what I wanted to do. Should have thought of converting into an Rle... too many new different data classes/container in Bioconductor, I still have to get comfortable using them properly.
To be precise, to construct a Views on a GRanges this needs to be coerced as RangesList otherwise it will throw an error. This is the code that works:
vi <- Views(rle$chr22, as(H1.chr22, "RangesList")$chr22)
Thanks for catching that mistake, yes, you need to convert to RangesList, but you can do that across the entire genome:
vi <- Views(rle, as(H1.chr22, "RangesList"))
That returns a ViewsList, and stuff like
viewMeans()
works directly on those.Wasn't there talk at some point about modifying Views to support GRanges? That would allow convenient things like this:
Currently this is a bit of a pain because coercing a GRanges into a RangesList modifies the order.
If order is an issue, the List approach in the answer is probably the way to go for now.
Absolutely. Note that you can already do this (i.e. specify views with a GRanges) on a BSgenome object. But we should also be able to do this on a RleList object. I'm raising the priority for this in my TODO list. Thanks for the reminder.
H.