Question

Best way to subset one IRanges by list of IRanges

0

Entering edit mode

Robert M. Flight ▴ 280

@robert-m-flight-4158

Last seen 8 months ago

United States

I have a case where I really want to generate a list of subsetted IRanges objects, where each one is the result of querying from a list of other IRanges objects.

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","b","c"))
slist <- split(subject, spartition)

sublist <- IRanges::subsetByOverlaps(query, slist)

# Error in (function (classes, fdef, mtable)  : 
#  unable to find an inherited method for function ‘findOverlaps’ for signature ‘"IRanges", "CompressedIRangesList"’

And I would get a list of length 3, where each would contain whatever was in query that overlapped with what was in that entry of slist. Right now, when I try this code, each entry of sublist is empty (IRanges v 2.14.2), and I don't see anything in News that leads me to believe it would be any different.

So right now I'm just purrr::maping over the entries in list, but figured if it was available in IRanges itself, it would incredibly more efficient.

iranges • 2.2k views

ADD COMMENT • link 5.5 years ago Robert M. Flight ▴ 280

0

Entering edit mode

I will also add, that if I do:

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","b","c"))
slist <- split(subject, spartition)
qlist <- split(query, rep(1, 3))

sublist2 <- IRanges::subsetByOverlaps(qlist, slist)

The entries will be empty, and of length 1. If I reverse it and do

sublist3 <- IRanges::subsetByOverlaps(slist, qlist)

Then it will be of length 3, but each entry is a zero-length IRanges.

ADD REPLY • link 5.5 years ago Robert M. Flight ▴ 280

0

Entering edit mode

The reason this does not exist is that a RangesList defines ranges within separate spaces (typically chromosomes), named by the names of the list. An IRanges has no defined space. We could add a method that simply repeats the search across every space, but so far there has been no motivating use case. More details on yours would help.

ADD REPLY • link 5.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I'm using raw IRanges functions to work on raw data points within an mass spectrum. The splitting into lists is to make it easier to do bplapply and furrr::future_map operations on the subsets of raw data points. So there is definitely no separate spaces like chromosomes in this type of application. There are separate scans of data, but they share the same range space, so they naturally get lumped together.

ADD REPLY • link 5.5 years ago Robert M. Flight ▴ 280

0

Entering edit mode

Would you be interested in contributing a subsetByOverlaps() method? Btw, I think you do need to reverse the arguments from your initial example, i.e., subsetByOverlaps(slist, query). The simplest thing to do is unlist(slist), perform the subset, then reform the list, which would be the slightly tricky part to do efficiently.

ADD REPLY • link 5.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I think the arguments are correct, at least based on how I'm currently doing it. If you think of the list wise operation, it becomes:

q_by_list = lapply(slist, function(s_sub){
    subsetByOverlaps(query, s_sub)
})

is what I'm trying to achieve, where each entry in q_by_list is the the bits of the query that were in each of the entries of slist

ADD REPLY • link 5.5 years ago Robert M. Flight ▴ 280

0

Entering edit mode

Is https://github.com/Bioconductor/IRanges the right place to submit any pull requests with the above mentioned method? And it's probably best to continue this conversation there on an issue ....

ADD REPLY • link 5.5 years ago Robert M. Flight ▴ 280

0

Entering edit mode

I see, in that case I'm not sure if that is really a subset operation. It's more like extractList() except by overlap.

ADD REPLY • link 5.5 years ago Michael Lawrence ★ 11k