Here are several examples that illustrate the cost of parallel evaluation
> library(BiocParallel)
> v = integer(1e8)
> system.time(lapply(1:8, function(i, v) i, v))
user system elapsed
0.004 0.000 0.001
Cost of starting up the nodes
> system.time(bplapply(1:8, function(i, v) i))
user system elapsed
0.148 0.012 0.481
Cost of transferring data to the workers
> system.time(bplapply(1:8, function(i, v) i, v))
user system elapsed
0.092 0.476 1.727
Cost of retrieving data from the workers
> system.time(bplapply(1:8, function(i, v) v, v))
user system elapsed
0.600 1.704 3.378
and of course the dominant cost, iteration instead of vectorization
> system.time(1:8)
user system elapsed
0 0 0
It seems likely that you've replaced a vectorized calculation with an interation, and are moving large amounts of data to and from the workers.
bpvec()
might be a better fit to your needs. And generally, the iteration over n assays implies potentially polynomial scaling, where the first assay is copied in the first iteration, then the first and second assays in the second iteration, then the first, second, and third assays in the third iteration, etc; one would rather develop a more efficient algorithm.