I have a case where I really want to generate a list
of subsetted IRanges
objects, where each one is the result of querying from a list
of other IRanges
objects.
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","b","c"))
slist <- split(subject, spartition)
sublist <- IRanges::subsetByOverlaps(query, slist)
# Error in (function (classes, fdef, mtable) :
# unable to find an inherited method for function ‘findOverlaps’ for signature ‘"IRanges", "CompressedIRangesList"’
And I would get a list of length 3, where each would contain whatever was in query
that overlapped with what was in that entry of slist
. Right now, when I try this code, each entry of sublist
is empty (IRanges v 2.14.2), and I don't see anything in News that leads me to believe it would be any different.
So right now I'm just purrr::map
ing over the entries in list, but figured if it was available in IRanges
itself, it would incredibly more efficient.
I will also add, that if I do:
The entries will be empty, and of length 1. If I reverse it and do
Then it will be of length 3, but each entry is a zero-length
IRanges
.The reason this does not exist is that a RangesList defines ranges within separate spaces (typically chromosomes), named by the names of the list. An IRanges has no defined space. We could add a method that simply repeats the search across every space, but so far there has been no motivating use case. More details on yours would help.
I'm using raw
IRanges
functions to work on raw data points within an mass spectrum. The splitting into lists is to make it easier to dobplapply
andfurrr::future_map
operations on the subsets of raw data points. So there is definitely no separate spaces like chromosomes in this type of application. There are separate scans of data, but they share the same range space, so they naturally get lumped together.Would you be interested in contributing a
subsetByOverlaps()
method? Btw, I think you do need to reverse the arguments from your initial example, i.e.,subsetByOverlaps(slist, query)
. The simplest thing to do isunlist(slist)
, perform the subset, then reform the list, which would be the slightly tricky part to do efficiently.I think the arguments are correct, at least based on how I'm currently doing it. If you think of the list wise operation, it becomes:
is what I'm trying to achieve, where each entry in
q_by_list
is the the bits of thequery
that were in each of the entries ofslist
Is https://github.com/Bioconductor/IRanges the right place to submit any pull requests with the above mentioned method? And it's probably best to continue this conversation there on an issue ....
I see, in that case I'm not sure if that is really a subset operation. It's more like
extractList()
except by overlap.