Find ranges that are shared by 80% or more of 10 GRanges objects
1
0
Entering edit mode
@2a6aaea2
Last seen 22 months ago
Netherlands

Introduction and problem

I have multiple (>2) GRanges objects. I want to find those ranges that are shared by x% or more of all GRanges.

Example data

I will provide some example data as dataframes, let's say we want to find those ranges that are shared by 66.7% (2/3) or more.

gr1 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(1, 10, 20), 
                  end = c(3, 17, 30))


gr2 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(2, 11, 31), 
                  end = c(3, 19, 35))


gr3 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(2, 16, 37), 
                  end = c(3, 22, 40))

Output wanted

A Granges output. In the example the algorithm should find:

chr1 2 - 3 Reason: (2-3 is found in gr1, gr2 and gr3, 1 only found in gr1) chr1 11 - 22 Reason: (11-17 is found in gr1 and gr2, 10 only in gr1 ,18-19 in gr2 and gr3, 20 -22 in gr1 and gr3)

What I have done

I know how to find query hits found in all (100%) GRanges, see https://stackoverflow.com/questions/23331475/r-overlap-multiple-granges-with-findoverlaps

Granges GenomicRanges • 891 views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Is this a coverage question?

Here's an approach with plyranges, after converting these df to ranges with as_granges():

n <- 3 # number of range sets                                                                                                    
bind_ranges(gr1, gr2, gr3, .id="origin") %>%                                                                                                   
  compute_coverage() %>%                                                                                                         
  mutate(fraction_cov = score / n) %>%                                                                                           
  filter(fraction_cov > .66) %>%                                                                                                 
  reduce_ranges()

There are some details, but you can work from here. E.g. this assumes that the individual ranges in the incoming range sets don't overlap each other. You could call group_by(origin) and reduce_ranges on those from the outset, after binding them together.

ADD COMMENT

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6