Hi,
I know that it is possible to parallelize a single knn query.
I want to run multiple knn queries in parallel although I am unsure if there is a benefit compared to the previous method.
I tried the following, but did not succeed. I am using Windows, so the "snow" option. Probably, it is not possible to pass an object with an external pointer, but if someone could confirm I would appreciate.
Best.
data(iris)
# Converts to numeric, ignoring the species
X <- as.matrix(iris[,-5])
# Build a research index
library(BiocNeighbors)
prebuilt <- buildIndex(X, BNPARAM = AnnoyParam(
ntrees = 50,
distance = "Euclidean"
))
out2 <- queryKNN(prebuilt, X, k=5)
# Set up parallelization
library(BiocParallel)
FUN <- function(x, prebuilt) {
suppressPackageStartupMessages({
library(BiocNeighbors)
})
queryKNN(prebuilt, x, k=5)
}
# check FUN; this works
FUN(as.data.frame(X[1:10,]), prebuilt)
# Define a 2-worker SOCK Snow cluster.
snow <- SnowParam(workers = 2, type = "SOCK")
# RUN: creates the cluster and distributes the work; this fails
bplapply(split(as.data.frame(X), 1:5), FUN, prebuilt, BPPARAM = snow)
Thank you for your detailed answer, I appreciate it and your work.