distanceToNearest does not respect strand when x is * and subject is stranded. According to the documentation, matches are done in the following manner:
x | subject | orientation
-----+-----------+----------------
a) + | + | --->
b) + | - | NA
c) + | * | --->
d) - | + | NA
e) - | - | <---
f) - | * | <---
g) * | + | --->
h) * | - | <---
i) * | * | ---> (the only situation where * arbitrarily means +)
When I actually test this, I find that
distanceToNearest(GRanges(seqnames=c("chr1", "chr2"), ranges=IRanges(start=55000, width=1), strand="*"), GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("+", "-", "+", "-"), select="all"))
distanceToNearest(GRanges(seqnames=c("chr1", "chr2"), ranges=IRanges(start=55000, width=1), strand="*"), GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("-", "+", "+", "-"), select="all"))
Both return 44999. No matter the interpretation of the orientation column in the linked documentation (is it x w.r.t subject or subject w.r.t x?), at least one of these examples should return the distance to the subject that starts at 1000000 because both the 1000 and 10000 subject ranges point "away" from position 55000.
They're all on the same man page, (nearest-methods {GenomicRanges}). My issue, isn't with follow/precede, it's with distanceToNearest.
Given both precede and follow give different answers for gr2 and gr3, why do they give the same answer for distanceToNearest? Does distanceToNearest completely ignore strand?
Is this a case of me misunderstanding what distanceToNearest actually does? It appears it completely ignores strand. Is this supposed to be the case? My use case is finding the distance to the nearest TSS for a given genomic position. Is this an inappropriate usage for distanceToNearest?