Question about nearest function in R package IRanges
1
0
Entering edit mode
@45a1818a
Last seen 7 months ago
Worcester

Hi all,

I am using nearest function in IGanges r package. But I am confused by this fowllowing example and what 'nearest' function described as below,

nearest: The conventional nearest neighbor finder. Returns an integer vector containing the index of the nearest neighbor range in subject for each range in x. If there is no nearest neighbor (if subject is empty), NA's are returned.

Here is roughly how it proceeds, for a range xi in x:

Find the ranges in subject that overlap xi. If a single range si in subject overlaps xi, si is returned as the nearest neighbor of xi. If there are multiple overlaps, one of the overlapping ranges is chosen arbitrarily.

If no ranges in subject overlap with xi, then the range in subject with the shortest distance from its end to the start xi or its start to the end of xi is returned.

## Here is an example in nearest
library("IRanges")
query <- IRanges(c(1, 3, 9), c(2, 7, 10))
subject <- IRanges(c(1, 3, 11), c(2, 7, 11))
# output query
query
# output subject
subject
nearest(query, subject)

According to the description of ‘nearest’, query[2] is overlapped by subject[2] , the final result of nearest should be

## 1 2 3

Instead of

## 1 1 3

I would really appreciate if you can provide some clarification. Many thnks in advance.

nearest IRanges • 1.5k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

Not sure I can help with rationale, but this comes from how findOverlaps is used, under the hood, and the fact that your first two ranges are adjacent.

## how it's used

> findOverlaps(query, subject, maxgap = 0L, select = "arbitrary")
[1] 1 1 3

## How I would normally use it

> findOverlaps(query, subject)
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         2           2
  -------
  queryLength: 3 / subjectLength: 3

## what you get if you want all overlaps with a maxgap of 0L
> findOverlaps(query, subject, maxgap = 0L, select = "all")
Hits object with 5 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  [3]         2           1
  [4]         2           2
  [5]         3           3
  -------
  queryLength: 3 / subjectLength: 3

So it's because the default is 'arbitrary'. You could also use 'all'.

> nearest(query, subject, select = "all")
Hits object with 5 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  [3]         2           1
  [4]         2           2
  [5]         3           3
  -------
  queryLength: 3 / subjectLength: 3
ADD COMMENT
0
Entering edit mode

Thanks for the great explanation, James!

It would be great if maxgap could be set to -1 in the nearest function.

Best regards,

Julie

ADD REPLY

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6