I have 2 GRanges objects of equal length. For each individual pair of rows from the 2 objects, I want to get:
a) the intersection of the ranges b) the ranges unique to the first range c) the ranges unique the second range
See my code with comments for specifics, here I will present an overview of my problem. I have no problem with intersect() - the seems to work. My trouble is getting the asymmetric differences between paired ranges. I can do this one row at a time using setdiff() on same row from both elements, but the runtime explodes. If I pass both GRanges objects, ranges that happen to be overlapping are collapsed into 1 range, which is not what I want. Because some ranges contain their counterpart from the other GRanges object, psetdiff() crashes. What is the fastest way to get what I want?
# test data
> test.a <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2000, 4000, 6000, 7000), end=c(2000, 3000, 5000, 7000, 8000))) > test.b <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2100, 4500, 5900, 7700), end=c(1500, 2900, 5000, 6300, 8200)))
# the output I am looking for
# note that the results for 1 and 2 share a base in common
# I still count these as 2 separate results
setdiff(test.a[1], test.b[1]) GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [1501, 2000] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
setdiff(test.a[2], test.b[2]) GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [2000, 2099] * [2] chr1 [2901, 3000] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
# when i pass the full GRanges objects
# note that 1 and 2 are now combined
# this is not what i want
setdiff(test.a, test.b, resolve.empty="none") GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [1501, 2099] * [2] chr1 [2901, 3000] * [3] chr1 [4000, 4499] * [4] chr1 [6301, 7699] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
# calling psetdiff(), which i understand is supposed to be the element-wise setdiff()
# it crashes
# this is not what i want
psetdiff(test.a, test.b, resolve.empty="none") Error in start(value) : error in evaluating the argument 'x' in selecting a method for function 'start': Error in psetdiff(extractROWS(ranges(x), ok), extractROWS(ranges(y), ok)) : some ranges in 'y' have their end points strictly inside the range in 'x' that they need to be subtracted from. Cannot subtract them.
Thank you for the explanation. Perhaps I am misunderstanding you. When I enter your code I get the error below.
Works for me:
EDIT: This did ultimately work
That's weird. I looked at your session info and saw that I did not have S4Vector loaded. I loaded S4Vector and XVector and now I get the following:
Hi Michael.
I find that this does NOT work in R3.5.0 but have updated your approach
After noticing:
```
> t.a <- as(test.a, "List")
> class(t.a)
[1] "GRanges"
attr(,"package")
[1] "GenomicRanges"
```
Converting instead to GRangesList gets me further, but not much:
```
t.a <- as(test.a, "GRangesList")
t.b <- as(test.b, "GRangesList")
psetdiff(t.a, t.b)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'psetdiff' for signature '"CompressedGRangesList", "CompressedGRangesList"
``
Thankfully, I noticed https://github.com/Bioconductor/GenomicRanges/blob/master/NEWS the notes
Rename "pintersect" and "psetdiff" methods for GRangesList objects ->
"intersect" and "setdiff" without changing their behavior (they still
do mendoapply(intersect, x, y) and mendoapply(setdiff, x, y),
respectively). The old names were misnomers (see svn commit message for
commit 113793 for more information).
So the updated solution becomes