Group GRanges and keep metadata
1
1
Entering edit mode
@likeeatingpizza-10790
Last seen 8.5 years ago

This question comes from a quiz exercise of the Coursera Bioconductor course. Since over there the class has been totally abandoned from the start by the tutors/mentors (supposing there are any) and apparently Mr Kasper Hansen has other priorities than checking the discussion board and helping his students, I have no other option than to post here.

I have a GRanges of H3K27me3 histone modification for the chr22 of H1 cell line that looks like this

> H1.chr22

GRanges object with 4068 ranges and 6 metadata columns:
         seqnames               ranges strand   |        name     score signalValue
            <Rle>            <IRanges>  <Rle>   | <character> <numeric>   <numeric>
     [1]    chr22 [16554493, 16554606]      *   | Rank_121761        23     2.36555   
     [2]    chr22 [16867445, 16867558]      *   |  Rank_98479        35     3.13762   
     [3]    chr22 [17270830, 17270985]      *   |  Rank_98480        35     3.13762   
     [4]    chr22 [17275375, 17275501]      *   |  Rank_78285        43     3.55442   
     [5]    chr22 [17325983, 17326150]      *   |  Rank_84671        39     3.22958

I have retrieved a vector of fold-change enrichment of ChIP signal for the same chr and cell line, that looks like this:

> fc_signal
GRanges object with 341236 ranges and 1 metadata column:
           seqnames               ranges strand   |            score
              <Rle>            <IRanges>  <Rle>   |        <numeric>
       [1]    chr22 [16554493, 16554502]      *   | 3.24963998794556
       [2]    chr22 [16554503, 16554530]      *   | 4.06205987930298
       [3]    chr22 [16554531, 16554542]      *   | 3.24963998794556
       [4]    chr22 [16554543, 16554544]      *   | 2.04167008399963
       [5]    chr22 [16554545, 16554565]      *   |          1.53125

As you can see the fc_signal ranges are much shorter (in width) but altogether they cover the same intervals as H1.chr22. Indeed their difference is zero

> setdiff(fc_signal, H1.chr22)
GRanges object with 0 ranges and 0 metadata columns:

I have to compute the average score in fc.signal for each region in the H1.chr22 GRanges. It was suggest to compute this using “Views”.

In other words, I need to find a way to "group" o "map" the fc_signal short ranges on the bigger ones, and getting the corresponding average value of their score. I have tried to do both subsetbyOverlaps or intersect, but other than the fact that I just get back the same query GRanges, I also loose the metadata values, so no chance to average them. I also don't see how could I create a Views with these two GRanges. Should I map them on the chr22 sequence (DNAString)?

I hope the question is clear

granges metadata coursera • 2.1k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

The fc_signal object is a run-length encoding of a vector using a GRanges. You can convert that to an Rle and then form views on that to aggregate:

rle <- coverage(fc_signal, weight="score")
viewMeans(Views(rle, H1.chr22))

As an alternative to using Views, you could use Lists:

mean(rle[H1.chr22])

 

ADD COMMENT
0
Entering edit mode

THANK YOU Sir, that's exactly what I wanted to do. Should have thought of converting into an Rle... too many new different data classes/container in Bioconductor, I still have to get comfortable using them properly.

ADD REPLY
0
Entering edit mode

To be precise, to construct a Views on a GRanges this needs to be coerced as RangesList otherwise it will throw an error. This is the code that works:

vi <- Views(rle$chr22, as(H1.chr22, "RangesList")$chr22)

ADD REPLY
0
Entering edit mode

Thanks for catching that mistake, yes, you need to convert to RangesList, but you can do that across the entire genome:

vi <- Views(rle, as(H1.chr22, "RangesList"))

That returns a ViewsList, and stuff like viewMeans() works directly on those.

ADD REPLY
0
Entering edit mode

Wasn't there talk at some point about modifying Views to support GRanges? That would allow convenient things like this:

mcols(H1.chr22)$mean_score <- viewMeans(Views(rle, H1.chr22))

Currently this is a bit of a pain because coercing a GRanges into a RangesList modifies the order.

ADD REPLY
0
Entering edit mode

If order is an issue, the List approach in the answer is probably the way to go for now.

ADD REPLY
0
Entering edit mode

Absolutely. Note that you can already do this (i.e. specify views with a GRanges) on a BSgenome object. But we should also be able to do this on a RleList object. I'm raising the priority for this in my TODO list. Thanks for the reminder.

H.

ADD REPLY

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6