nearest and nearestKNeighbors with k=1 produce different results
0
0
Entering edit mode
@simon-coetzee-5051
Last seen 2.4 years ago
USA Cedars-Sinai Medical Center

I am receiving a warning message for nearestKNeighbors that I don't receive for nearest that also seems to alter the output of the function.

using the data p_ran and e_ran below

p_ran <- GRanges(c("chr1:248836765-248839421",
"chr1:248857227-248860010",
"chr1:248905669-248907808",
"chr2:263751-265942",
"chr2:283627-286204",
"chr2:287549-288546",
"chr2:317349-318734",
"chr2:674901-679175",
"chr2:692014-694241",
"chr2:739716-741208",
"chr2:1579709-1580907"))
e_ran <- GRanges(c("chr1:248842064-248842915",
"chr1:248862738-248863858",
"chr1:248924237-248927727",
"chr2:246452-247319",
"chr2:280784-281089",
"chr2:673304-674615",
"chr2:690001-690466",
"chr2:694530-694842",
"chr2:1572461-1573381",
"chr2:349126-349492",
"chr2:1586877-1589526"))

when running nearestKNeighbors I receive the warning:

Warning message:
In ans[] <- x :
  number of items to replace is not a multiple of replacement length

This, or something else causes the result of nearest and nearestKNeighbors to differ at 4 positions which you can see in the code below:

p_ran
e_ran
which(nearest(p_ran, e_ran) != unlist(nearestKNeighbors(p_ran, e_ran, k = 1)))
e_ran[nearest(p_ran, e_ran)]
e_ran[unlist(nearestKNeighbors(p_ran, e_ran, k = 1))]

here are the results:

> p_ran
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]     chr1 248836765-248839421      *
   [2]     chr1 248857227-248860010      *
   [3]     chr1 248905669-248907808      *
   [4]     chr2       263751-265942      *
   [5]     chr2       283627-286204      *
   [6]     chr2       287549-288546      *
   [7]     chr2       317349-318734      *
   [8]     chr2       674901-679175      *
   [9]     chr2       692014-694241      *
  [10]     chr2       739716-741208      *
  [11]     chr2     1579709-1580907      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> e_ran
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]     chr1 248842064-248842915      *
   [2]     chr1 248862738-248863858      *
   [3]     chr1 248924237-248927727      *
   [4]     chr2       246452-247319      *
   [5]     chr2       280784-281089      *
   [6]     chr2       673304-674615      *
   [7]     chr2       690001-690466      *
   [8]     chr2       694530-694842      *
   [9]     chr2     1572461-1573381      *
  [10]     chr2       349126-349492      *
  [11]     chr2     1586877-1589526      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> which(nearest(p_ran, e_ran) != unlist(nearestKNeighbors(p_ran, e_ran, k = 1)))
[1]  4  7  9 11
Warning message:
In ans[] <- x :
  number of items to replace is not a multiple of replacement length
> e_ran[nearest(p_ran, e_ran)]
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]     chr1 248842064-248842915      *
   [2]     chr1 248862738-248863858      *
   [3]     chr1 248924237-248927727      *
   [4]     chr2       280784-281089      *
   [5]     chr2       280784-281089      *
   [6]     chr2       280784-281089      *
   [7]     chr2       349126-349492      *
   [8]     chr2       673304-674615      *
   [9]     chr2       694530-694842      *
  [10]     chr2       694530-694842      *
  [11]     chr2     1586877-1589526      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> e_ran[unlist(nearestKNeighbors(p_ran, e_ran, k = 1))]
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]     chr1 248842064-248842915      *
   [2]     chr1 248862738-248863858      *
   [3]     chr1 248924237-248927727      *
   [4]     chr2       246452-247319      *
   [5]     chr2       280784-281089      *
   [6]     chr2       280784-281089      *
   [7]     chr2       280784-281089      *
   [8]     chr2       673304-674615      *
   [9]     chr2       690001-690466      *
  [10]     chr2       694530-694842      *
  [11]     chr2     1572461-1573381      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
Warning message:
In ans[] <- x :
  number of items to replace is not a multiple of replacement length

This is a small reproducible version of a larger data-set where this happens many more times. I don't understand why this is.

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicRanges_1.48.0   GenomeInfoDb_1.32.2    IRanges_2.30.0
[4] S4Vectors_0.34.0       BiocGenerics_0.42.0    RhpcBLASctl_0.21-247.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.42.0        compiler_4.2.1         tools_4.2.1
[4] XVector_0.36.0         GenomeInfoDbData_1.2.8 RCurl_1.98-1.7
[7] bitops_1.0-7
GenomicRanges • 725 views
ADD COMMENT

Login before adding your answer.

Traffic: 923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6