Question

Compare ranges from two GRanges and check if the ranges from the first GRanges is contained in another.

0

Entering edit mode

omarrafiqued ▴ 50

@omarrafiqued-21833

Last seen 13 months ago

India

I have the following ranges as GRanges data {Ranges_1}: 1-2, 2-4, 3-6, 6-8, 2-8, 8-11.

{Ranges_2} 1-5, 6-10, 11-15,

I need a mapping from Ranges1 to Ranges2. Something like following (1-2) -> (1-5), (2-4) -> (1-5), (3-6) -> (1-5) & (6-10), (5-7) -> (1-5) & (6-10), (6-8) -> (6-10), (8-11) -> (6-10) & (11-15).

Could someone please help me with the R code for this?

GRanges IRanges • 2.8k views

ADD COMMENT • link updated 4.9 years ago by James W. MacDonald 68k • written 4.9 years ago by omarrafiqued ▴ 50

score 3 · Accepted Answer · 2020-06-05

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

> gr1 <- GRanges(rep("chr1", 6), IRanges(c(1,2,3,6,2,8), c(2,4,6,8,8,11)))
> gr2 <- GRanges(rep("chr1", 3), IRanges(c(1,6,11), c(5,10,15)))

> fo <- findOverlaps(gr1, gr2)
> fo
Hits object with 9 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         2           1
  [3]         3           1
  [4]         3           2
  [5]         4           2
  [6]         5           1
  [7]         5           2
  [8]         6           2
  [9]         6           3
  -------
  queryLength: 6 / subjectLength: 3
> DataFrame(gr1 = gr1[queryHits(fo)], gr2 = gr2[subjectHits(fo)])
DataFrame with 9 rows and 2 columns
        gr1        gr2
  <GRanges>  <GRanges>
1  chr1:1-2   chr1:1-5
2  chr1:2-4   chr1:1-5
3  chr1:3-6   chr1:1-5
4  chr1:3-6  chr1:6-10
5  chr1:6-8  chr1:6-10
6  chr1:2-8   chr1:1-5
7  chr1:2-8  chr1:6-10
8 chr1:8-11  chr1:6-10
9 chr1:8-11 chr1:11-15

ADD COMMENT • link 4.9 years ago James W. MacDonald 68k

1

Entering edit mode

findOverlapPairs() is also relevant.

ADD REPLY • link 4.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

What if I dont want one-to-many mapping. e.g., Above there is:

3  chr1:3-6   chr1:1-5
4  chr1:3-6  chr1:6-10

Since, chr1:3-6 lies more in chr1:1-5, so , I would not like 4 chr1:3-6 chr1:6-10 in the result. Is there any elegant way of doing it?

ADD REPLY • link 4.9 years ago omarrafiqued ▴ 50

1

Entering edit mode

You'll have to wait for Michael to respond if you want elegant (see above). If you want super hackification, I'm your man.

> fo <- findOverlaps(gr1, gr2)
> pol <- findOverlapPairs(gr1, gr2)
> lst <- split(data.frame(width = width(pintersect(pol)), ind = 1:9), queryHits(fo), drop = FALSE)
> pol[sapply(lst, function(x) x[which.max(x$width),"ind"]),]
Pairs object with 6 pairs and 0 metadata columns:
          first    second
      <GRanges> <GRanges>
  [1]  chr1:1-2  chr1:1-5
  [2]  chr1:2-4  chr1:1-5
  [3]  chr1:3-6  chr1:1-5
  [4]  chr1:6-8 chr1:6-10
  [5]  chr1:2-8  chr1:1-5
  [6] chr1:8-11 chr1:6-10

ADD REPLY • link 4.9 years ago James W. MacDonald 68k

1

Entering edit mode

It's possible to only find the overlaps once:

fo <- findOverlaps(gr1, gr2)
pol <- Pairs(gr1, gr2, hits=fo)

Also you could find the pairs with the most overlap doing this

pol[which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE)]

It may be more intuitive to use plyranges, but not sure how to do it.

ADD REPLY • link 4.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

One more modification:

How to drop a range which belongs to the first GRanges and does not map to any range in the second GRanges.

For such case I want the mapping to be something like:

[x]  chr1:a-b  NA

ADD REPLY • link 4.9 years ago omarrafiqued ▴ 50

1

Entering edit mode

I think that would have to be separate and Michael's more elegant method won't work?

> gr1 <- GRanges(rep("chr1", 7), IRanges(c(1,2,3,6,2,8,20), c(2,4,6,8,8,11,42)))
> gr2 <- GRanges(rep("chr1", 3), IRanges(c(1,6,11), c(5,10,15)))
> ind <- ! gr1 %over% gr2
> fo <- findOverlaps(gr1, gr2)
> pol <- Pairs(gr1, gr2, hits = fo)
> pol[which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE)]
Error: subscript contains NAs
## Urgh
> lst <- split(data.frame(width = width(pintersect(pol)), ind = 1:9), queryHits(fo), drop = FALSE)
> pol[sapply(lst, function(x) x[which.max(x$width),"ind"]),]
Pairs object with 6 pairs and 0 metadata columns:
          first    second
      <GRanges> <GRanges>
  [1]  chr1:1-2  chr1:1-5
  [2]  chr1:2-4  chr1:1-5
  [3]  chr1:3-6  chr1:1-5
  [4]  chr1:6-8 chr1:6-10
  [5]  chr1:2-8  chr1:1-5
  [6] chr1:8-11 chr1:6-10
> DataFrame(first = gr1[ind], second = rep("NA", sum(ind)))
DataFrame with 1 row and 2 columns
       first      second
   <GRanges> <character>
1 chr1:20-42          NA

ADD REPLY • link 4.9 years ago James W. MacDonald 68k

0

Entering edit mode

Getting an error:

Warning message: In .Seqinfo.mergexy(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.)

I know this is a warning, but the output is empty.

ADD REPLY • link 4.9 years ago omarrafiqued ▴ 50

1

Entering edit mode

Modifying my original solution, you can exclude NA cases and subset the Hits object. Since Hits tracks the "universe" of the original input, like a factor, you can pass it to a HelloRanges utility to perform the left outer join and generate the closest thing to a NA value.

fo <- findOverlaps(gr1, gr2)
fo <- fo[na.omit(which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE))]
HelloRanges::pair(gr1, gr2, fo, all.x=TRUE)

ADD REPLY • link 4.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

By elegant, I mean something that works and is easy to grasp ans you did that exactly. Thanks again.

ADD REPLY • link 4.9 years ago omarrafiqued ▴ 50