Entering edit mode
Apologies in advance if I just don't understand the ignore.strand
switch, or
perhaps GRanges objects. Also, I have not even tried to understand
GenomicRanges:::.GenomicRanges_findPrecedeFollow
I have the impression that a query with strand entirely "*"
essentially implies
ignore.strand, but this case here is handled correctly only if
ignore.strand is
TRUE (consistent with distanceToNearest but not distance, which seems
correct):
> x = GRanges(ranges=IRanges(start=5, end=5), seqnames="chr1",
strand="*")
> Y = GRanges(ranges=IRanges(start=c(6,7), end=c(6,7)),
seqnames="chr1", strand=c("+","-"))
> distance(x, Y[1])
[1] 1
> distance(x, Y[2])
[1] 2
> nearest(x, Y, ignore.strand=TRUE) # correct
[1] 1
> nearest(ranges(x), ranges(Y)) # also correct
[1] 1
However,
> nearest(x, Y)
[1] 2
> distanceToNearest(x, Y)
DataFrame with 1 row and 3 columns
queryHits subjectHits distance
<integer> <integer> <integer>
1 1 2 2
> distanceToNearest(x, Y, ignore.strand=TRUE)
DataFrame with 1 row and 3 columns
queryHits subjectHits distance
<integer> <integer> <integer>
1 1 1 1
Finally,
> follow(ranges(x), ranges(Y))
[1] NA
The issue (along with GenomicRanges:::.nearest) must come down to
this:
> follow(x, Y) # how can this be?
[1] 2
whereas
> follow(x, Y, ignore.strand=TRUE) # correct, I think
[1] NA
> sessionInfo()
R Under development (unstable) (2012-10-10 r60908)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] GenomicRanges_1.11.0 IRanges_1.17.0 BiocGenerics_0.5.0
loaded via a namespace (and not attached):
[1] parallel_2.16.0 stats4_2.16.0 tools_2.16.0