Element-wise setdiff when psetdiff fails for GenomicRanges
1
1
Entering edit mode
@kipper-fletez-brant-6421
Last seen 6.8 years ago
United States

I have 2 GRanges objects of equal length.  For each individual pair of rows from the 2 objects, I want to get:

    a) the intersection of the ranges

    b) the ranges unique to the first range

    c) the ranges unique the second range

See my code with comments for specifics, here I will present an overview of my problem.  I have no problem with intersect() - the seems to work.  My trouble is getting the asymmetric differences between paired ranges.  I can do this one row at a time using setdiff() on same row from both elements, but the runtime explodes.  If I pass both GRanges objects, ranges that happen to be overlapping are collapsed into 1 range, which is not what I want.  Because some ranges contain their counterpart from the other GRanges object, psetdiff() crashes.  What is the fastest way to get what I want?

 

# test data

 > test.a <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2000, 4000, 6000, 7000), end=c(2000, 3000, 5000, 7000, 8000)))
 > test.b <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2100, 4500, 5900, 7700), end=c(1500, 2900, 5000, 6300, 8200)))

#  the output I am looking for

# note that the results for 1 and 2 share a base in common

# I still count these as 2 separate results

 setdiff(test.a[1], test.b[1])
 GRanges object with 1 range and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [1501, 2000]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

setdiff(test.a[2], test.b[2])
 GRanges object with 2 ranges and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [2000, 2099]      *
   [2]     chr1 [2901, 3000]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths



# when i pass the full GRanges objects

# note that 1 and 2 are now combined

# this is not what i want

setdiff(test.a, test.b, resolve.empty="none")
 GRanges object with 4 ranges and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [1501, 2099]      *
   [2]     chr1 [2901, 3000]      *
   [3]     chr1 [4000, 4499]      *
   [4]     chr1 [6301, 7699]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths



# calling psetdiff(), which i understand is supposed to be the element-wise setdiff()

# it crashes

# this is not what i want

psetdiff(test.a, test.b, resolve.empty="none")
 Error in start(value) :
   error in evaluating the argument 'x' in selecting a method for function 'start': Error in psetdiff(extractROWS(ranges(x), ok), extractROWS(ranges(y), ok)) :
   some ranges in 'y' have their end points strictly inside
   the range in 'x' that they need to be subtracted from.
   Cannot subtract them.



 

genomicranges iranges genomics • 3.5k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-6705
Last seen 7.7 years ago
United States

The psetdif() operation is indeed what you want; however, it is what we call an "endomorphism", i.e., it returns an object of the same type as its inputs. A GRanges is obviously incapable of representing the result of a setdiff when "x" contains "y".  Thus, we need a data structure that is capable of representing that complexity. That data structure is GRangesList.

The solution is to convert your GRanges objects to GRangesList objects and then call psetdiff().

grl.a <- as(test.a, "List")
grl.b <- as(test.b, "List")
psetdiff(grl.a, grl.b)

Hope that helps.

ADD COMMENT
0
Entering edit mode

Thank you for the explanation.  Perhaps I am misunderstanding you.  When I enter your code I get the error below.

grl.a <- as(test.a, "List")
Error in as(from, selectListClassName(element.type)) : 
  no method or default for coercing “GRanges” to “GRangesList”
ADD REPLY
0
Entering edit mode

Works for me:

> library(GenomicRanges)
> test.a <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2000, 4000, 6000, 7000), end=c(2000, 3000, 5000, 7000, 8000)))
> test.b <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2100, 4500, 5900, 7700), end=c(1500, 2900, 5000, 6300, 8200)))
> t.a <- as(test.a, "List")
> t.b <- as(test.b, "List")
> psetdiff(t.a, t.b)
GRangesList object of length 5:
[[1]]
GRanges object with 1 range and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]     chr1 [1501, 2000]      *

[[2]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames       ranges strand
  [1]     chr1 [2000, 2099]      *
  [2]     chr1 [2901, 3000]      *

[[3]]
GRanges object with 1 range and 0 metadata columns:
      seqnames       ranges strand
  [1]     chr1 [4000, 4499]      *

...
<2 more elements>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

> session_info()
Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.1.1 (2014-07-10)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       <NA>                        

Packages -----------------------------------------------------------------------
 package       * version date       source        
 BiocGenerics    0.12.1  2014-11-17 Bioconductor  
 devtools        1.7.0   2015-01-17 CRAN (R 3.1.1)
 GenomeInfoDb    1.2.4   2015-01-05 Bioconductor  
 GenomicRanges   1.18.4  2015-01-20 Bioconductor  
 IRanges         2.0.1   2014-12-15 Bioconductor  
 rstudioapi    * 0.2     2014-12-31 CRAN (R 3.1.1)
 S4Vectors       0.4.0   2014-10-15 Bioconductor  
 XVector       * 0.6.0   2014-10-15 Bioconductor  

 

ADD REPLY
0
Entering edit mode

EDIT: This did ultimately work

 

That's weird.  I looked at your session info and saw that I did not have S4Vector loaded.  I loaded S4Vector and XVector and now I get the following:

> psetdiff(t.a, t.b)
GRangesList of length 5:
[[1]] 
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getListElement’ for signature ‘"GRangesList"’
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] XVector_0.6.0        S4Vectors_0.4.0      BiocInstaller_1.16.1
[4] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4   IRanges_2.0.1       
[7] BiocGenerics_0.12.1 

loaded via a namespace (and not attached):
[1] compiler_3.1.2 tools_3.1.2   

 

ADD REPLY
0
Entering edit mode

Hi Michael.

I find that this does NOT work in R3.5.0 but have updated your approach 

After noticing:

```
> t.a <- as(test.a, "List")
> class(t.a)

[1] "GRanges"
attr(,"package")
[1] "GenomicRanges"
```
Converting instead to GRangesList gets me further, but not much:

```
t.a <- as(test.a, "GRangesList")
t.b <- as(test.b, "GRangesList")
psetdiff(t.a, t.b)

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'psetdiff' for signature '"CompressedGRangesList", "CompressedGRangesList"
 ``

Thankfully, I noticed https://github.com/Bioconductor/GenomicRanges/blob/master/NEWS the notes 

Rename "pintersect" and "psetdiff" methods for GRangesList objects ->
      "intersect" and "setdiff" without changing their behavior (they still
      do mendoapply(intersect, x, y) and mendoapply(setdiff, x, y),
      respectively). The old names were misnomers (see svn commit message for
      commit 113793 for more information).

 

 

So the updated solution becomes

grl.a <- as(test.a, "List")
grl.b <- as(test.b, "List")
psetdiff(grl.a, grl.b)

 

ADD REPLY

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6