extracting distance from Hits object
1
1
Entering edit mode
@lizingsimmons-6935
Last seen 4.1 years ago
Germany

distanceToNearest returns a Hits object. Is there a simpler way to extract the distance column from this rather than turning it into a data.frame and then selecting the column?

Example:


gr1 =GRanges(seqnames=c("chr1","chr2","chr2"),
           ranges=IRanges(start=c(50,150,200),end=c(100,200,300)),
           strand=c("+","-","-"))
gr2 =GRanges(seqnames=c("chr1","chr2","chr2"),
           ranges=IRanges(start=c(175,250,400),end=c(225,375,500)),
           strand=c("+","-","-"))

hits <- distanceToNearest(gr1,gr2)

hits$distance
## Error in hits$distance : $ operator not defined for this S4 class

as.data.frame(hits)$distance
## [1] 74 49  0

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4   IRanges_2.0.1        S4Vectors_0.4.0      BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] tools_3.1.2   XVector_0.6.0

queryHits and subjectHits can each be accessed with a single function. Is there a similar single operation I can do to extract the distance? I imagine most people want to access the distance when using distanceToNearest, or they'd just use nearest, so this would be a useful thing to have if it doesn't exist.

 

 

genomicranges iranges • 3.4k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Liz,

The display of Hits objects doesn't give a clue but distance here is a metadata column of the object:

> hits
Hits of length 3
queryLength: 3
subjectLength: 3
  queryHits subjectHits  distance 
   <integer>   <integer> <integer> 
 1         1           1        74 
 2         2           2        49 
 3         3           2         0 

So like with any other object that can hold metadata columns, a specific column can be accessed with:

mcols(hits)$distance
[1] 74 49  0

In BioC devel the display of Hits objects has been improved so that metadata columns can be distinguished from the slots:

> hits
Hits object with 3 hits and 1 metadata column:
      queryHits subjectHits |  distance
      <integer>   <integer> | <integer>
  [1]         1           1 |        74
  [2]         2           2 |        49
  [3]         3           2 |         0
  -------
  queryLength: 3
  subjectLength: 3

This is the style of display that we have adopted for most objects with metadata columns (some objects like Ranges are not adhering to that convention yet but will soon).

For the record, there are 3 fundamental differences between slots and metadata columns:

  1. The latter are optional (i.e. you can add or remove metadata columns any time) while the former are guaranteed to be present (you can NOT add or remove them).
  2. The content of the slots is guaranteed to meet some requirements. These requirements depend on the class of the object. For example some of the requirements for the queryHits and subjectHits slots of a Hits object is that they are integer vectors with no NAs and no negative values (there are other requirements). In contrast metadata columns can be anything.
  3. Methods for comparing and ordering the elements in a vector-like object (e.g. ==, >=, !=, match, order, sort, etc...) only look at the slots. The metadata columns are ignored.

Hope this helps,

H.

 

 

ADD COMMENT
0
Entering edit mode

Thank you! Knowing that it's a metadata column it makes more sense now, and the new display will definitely make this clearer.

ADD REPLY

Login before adding your answer.

Traffic: 1070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6