Intersecting a Granges and a GRangesList using within
2
0
Entering edit mode
d r ▴ 150
@d-r-5459
Last seen 6.9 years ago
Israel

Hello, I have a GRanges (gr) object and a GRangesList (grlist).

I want to find elements in grlist that fully overlap elements of gr.

That is, I want to know, for a given i and j if there is k such that

grlist[j][[k]] is fully contained within gr[i]. It is important

that the values of both i and j can be extracted from the function (k is not important for me).

For example, if I have

gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)

gr[1][1] is within gr[1]. However no element of gr2 is fully contained in gr[1].

Further, no element of either gr1 or gr2 is fully contained in gr[2].

So I would the function call f(gr,grlist) to return (TRUE,FALSE,FALSE,FALSE)

I tried using:

flat<-unlist(grlist,use.names=FALSE)
g<-relist(flat%within%gr, grlist)

But this only tells me for an element of grlist if it is contained in any element of gr.

Is there any way to extract which which element of gr contains it?

Note: I did ask a similar question (https://support.bioconductor.org/p/71619), but on an afterthought decided that mapply() style comparison does not suit my needs (I need to compare every i to every j, not just j=i). Sorry for the confusion.

 

Thanks in advance

Dolev Rahat

 

granges grangeslist findoverlaps • 2.5k views
ADD COMMENT
2
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi,

In your example, both ranges of 'gr' are contained within ranges of 'grlist'.
None of the ranges in 'grlist' are within the ranges in 'gr'.

gr <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(50,100)))
gr1 <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(500,1000)))
gr2 <- GRanges(c('chr1','chr4'), IRanges(c(10,1000), end=c(10000,4000)))
grlist <- GRangesList(gr1,gr2)

Use findOverlaps() to identify 'gr' ranges that fall "within" 'grlist'. The return value is a 'Hits' object with information for all range combinations. This way you don't need to call findOverlaps() for each range in 'gr'.

flat <- unlist(grlist, use.names=FALSE)
fo <- findOverlaps(gr, flat, type = "within")
> fo
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           3
  [3]         2           2
  -------
  queryLength: 2
  subjectLength: 4

Next, separate the information by range in 'gr'. Note that the output will be the same length as the number of ranges in 'gr' that had hits - this is not necessarily the same length as 'gr'.

> lapply(lst, function(i) {
+     vec <- logical(subjectLength(fo))
+     vec[i]  <- TRUE
+     vec
+ })
$`1`
[1]  TRUE FALSE  TRUE FALSE

$`2`
[1] FALSE  TRUE FALSE FALSE

If you want 'vec' as a list of the same geometry as 'grlist' change the last line in the lapply to relist(vec, grlist).

Extracting the relevant ranges of 'grlist' can be done with the Hits object, you don't need to create your own logical vector.

> flat[subjectHits(fo)]
GRanges object with 3 ranges and 0 metadata columns:
      seqnames      ranges strand
         <Rle>   <IRanges>  <Rle>
  [1]     chr1 [10,   500]      *
  [2]     chr1 [10, 10000]      *
  [3]     chr2 [20,  1000]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

 

I think this is what you wanted. Let me know if I misunderstood and we can try
again.

Valerie

ADD COMMENT
0
Entering edit mode

@Valerie Obenchain. Thanks for your reply. The output in your example seems what I want. However, In your call to lapply(), you iterate on an object called lst? Can you please tell me what it this object. I did not see it defined in your reply. Thanks again.

ADD REPLY
0
Entering edit mode

Oh my, I completely missed that line when I pasted. Sorry about that. The missing 'lst' is a split of the subject hits (ranges in 'flat') by the query hits (ranges in 'gr').

> lst <- split(subjectHits(fo), queryHits(fo))
> lst
$`1`
[1] 1 3

$`2`
[1] 2 

Valerie

ADD REPLY
0
Entering edit mode

Thanks! just what I needed.

 

ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 9 days ago
Seattle, WA, United States

Hi,

FWIW I'd like to mention that type="within" is supported between a GRangesList query and a GRangesList subject, and in that case a hit between query[i] and subject[j] is reported only if all the ranges in the former are within a range in the latter. So in your case, to find the elements in grlist that fully contain gr (if I understand correctly what you are trying to do) you can put gr inside a GRangesList of length 1:

subjectHits(findOverlaps(GRangesList(gr), grlist, type="within"))

H.

ADD COMMENT

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6