Question

overlapsAny for elements in GRangesList

0

Entering edit mode

maltethodberg ▴ 180

@maltethodberg-9690

Last seen 7 weeks ago

Denmark

I have a GRanges and a GRangesList object, i.e.:

gr # GRanges

grList # GRangesList

I wish to know what ranges in the GRanges are in each of the ranges in each element in the list, so I thought of calling overlapsAny on each element of the GRangesList like so:

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))

This gives me the desired output, however this is very slow!

Is there a faster of doing the same thing?

granges grangeslist overlapsAny • 2.7k views

ADD COMMENT • link updated 9 months ago by Samuel Ahuno • 0 • written 8.7 years ago by maltethodberg ▴ 180

0

Entering edit mode

Perhaps you could describe your high-level use case / problem?

ADD REPLY • link 8.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I'm doing some enrichment analysis, so I need to annotate a "universe" of ranges with a wide range of other annotations.

ADD REPLY • link 8.7 years ago maltethodberg ▴ 180

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.1 years ago

United States

Since your GRangesList is short (relative to something like a transcriptome), it is probably much faster to use a SimpleGenomicRangesList instead of GRangesList (a compressed list). See `?GenomicRangesList`.

ADD COMMENT • link 8.7 years ago Michael Lawrence ★ 11k

score 4 · Accepted Answer · 2016-05-09

4

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 9 days ago

Seattle, WA, United States

Hi,

Edit: This solution is incorrect.

This is a situation where you can avoid the lapply loop by using the unlist/relist idiom:

unlisted_grList <- unlist(grList, use.names=FALSE)
relist(overlapsAny(query=unlisted_grList, subject=gr), grList)

Will be 100x faster or more than using lapply().

Correct solution:

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE

See comments below for the details.

Cheers,

H.

ADD COMMENT • link 8.7 years ago Hervé Pagès 16k

0

Entering edit mode

Is that equivalent to the code in the question? He wants the query to be gr, and the subject the grList element.

ADD REPLY • link 8.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Correct, this is not equivalent to the original example, here you overlap each element of the GRlist with GR, rather than the other way around.

ADD REPLY • link 8.7 years ago maltethodberg ▴ 180

2

Entering edit mode

Oops, you guys are right, my bad. Here is my 2nd attempt. The unlist/relist trick is actually not needed:

library(GenomicRanges)
gr <- GRanges(c("chr1:9-14", "chr3:11-16", "chr1:4-5"))
grList <- GRangesList(GRanges("chr1:5-10"), 
                      GRanges(c("chr2:1-4", "chr1:4-5")),
                      GRanges(),
                      GRanges("chr3:15-17"))

lapply solution (slow):

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))
overlapsList
# [[1]]
# [1]  TRUE FALSE  TRUE
# [[2]]
# [1] FALSE FALSE  TRUE
# [[3]]
# [1] FALSE FALSE FALSE
# [[4]]
# [1] FALSE  TRUE FALSE

Fast solution (no-loop):

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE
m
#       [,1]  [,2]  [,3]  [,4]
# [1,]  TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE  TRUE
# [3,]  TRUE  TRUE FALSE FALSE

The above matrix has 1 row per range in the GRanges object and 1 column per list element in the GRangesList object.

To get the result as a list of indices into the query:

as(t(hits), "List")
# IntegerList of length 4
# [[1]] 1 3
# [[2]] 3
# [[3]] integer(0)
# [[4]] 2

Hopefully this time I got it right...

Cheers,

H.

PS: I edited my initial answer above accordingly.

ADD REPLY • link 8.7 years ago Hervé Pagès 16k

0

Entering edit mode

Nice solution. This could be implemented as a method on an adjacencyMatrix() generic, which could be promoted from graph, or graph could enhance S4Vectors.

ADD REPLY • link 8.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Nice! I was already playing around with findOverlaps, but this is so much faster!

ADD REPLY • link 8.7 years ago maltethodberg ▴ 180

0

Entering edit mode

Hervé Pagès how can i fill the overlaps matrix with scores/values ? ie replace TRUE in overlaps matrix with methylation_level in example granges below.

hits_unlist <- findOverlaps(gr_methylation, grl.2_unlist)
scoreMatrix_TrueFalse <- matrix(FALSE, nrow=queryLength(hits_unlist), ncol=subjectLength(hits_unlist))
scoreMatrix_TrueFalse[as.matrix(hits_unlist)] <- TRUE

#read data
length_gr <- 21
gr_methylation = suppressWarnings(GRanges(
  seqnames = Rle(rep(1, length_gr)),
  ranges = IRanges(start=100:120, end=100:120),
  score = 1:length_gr,
  methylation_level = runif(length_gr),
  GC = seq(1, 0, length=length_gr)))


grl.2_unlist <- structure(new("GRanges", seqnames = new("Rle", values = structure(1L, levels = "1", class = "factor"), 
                                        lengths = 24L, elementMetadata = NULL, metadata = list()), 
              ranges = new("IRanges", start = c(99L, 100L, 101L, 102L, 
                                                102L, 103L, 104L, 105L, 103L, 104L, 105L, 106L, 108L, 109L, 
                                                110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L
              ), width = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                           1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), NAMES = NULL, 
              elementType = "ANY", elementMetadata = NULL, metadata = list()), 
              strand = new("Rle", values = structure(3L, levels = c("+", 
                                                                    "-", "*"), class = "factor"), lengths = 24L, elementMetadata = NULL, 
                           metadata = list()), seqinfo = new("Seqinfo", seqnames = "1", 
                                                             seqlengths = NA_integer_, is_circular = NA, genome = NA_character_), 
              elementMetadata = new("DFrame", rownames = NULL, nrows = 24L, 
                                    elementType = "ANY", elementMetadata = NULL, metadata = list(), 
                                    listData = list(GeneName = c("GeneA", "GeneA", "GeneA", 
                                                                 "GeneA", "GeneB", "GeneB", "GeneB", "GeneB", "GeneC", 
                                                                 "GeneC", "GeneC", "GeneC", "GeneD", "GeneD", "GeneD", 
                                                                 "GeneD", "GeneE", "GeneE", "GeneE", "GeneE", "GeneF", 
                                                                 "GeneF", "GeneF", "GeneF"), grl.ix = c(1L, 1L, 1L, 1L, 
                                                                                                        2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 
                                                                                                        5L, 5L, 6L, 6L, 6L, 6L), grl.iix = c(1L, 2L, 3L, 4L, 
                                                                                                                                             1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
                                                                                                                                             3L, 4L, 1L, 2L, 3L, 4L))), elementType = "ANY", metadata = list()))

thanks in advance

ADD REPLY • link 9 months ago Samuel Ahuno • 0