bplapply with progressbar
1
0
Entering edit mode
wt215 • 0
@wt215-12405
Last seen 21 months ago
United Kingdom

Hello,

I am replacing foreach with BiocParallel in my package. I wonder whether could I maintain the same setting of progress bar  as in foreach for bplapply. (The same problem as listed in https://github.com/Bioconductor/BiocParallel/issues/54 and https://stat.ethz.ch/pipermail/bioc-devel/2017-December/012572.html.

Firstly I created a simple example in R:

nrow=10000
ncol=500
matrixx=matrix(runif(nrow*ncol),nrow=nrow,ncol=ncol)

 

Using foreach with progressbar:

library(parallel)
library(doSNOW)
library(foreach)
cluster=makeCluster(5,type='SOCK')
registerDoSNOW(cluster)
getDoParWorkers()
iterations<-nrow
pb<-txtProgressBar(max=iterations,style =3)
progress<-function(n)setTxtProgressBar(pb,n)
opts<-list(progress=progress)
BB_parmat<-foreach(geneind=1:dim(matrixx)[1],.combine=c,.options.snow=opts)%dopar%{
  return(mean(matrixx[geneind,]))
}

close(pb)
stopCluster(cluster)

Using bplapply with progress bar (a potential problem is that the progressbar will show 0% for a long time, and then suddenly increases):

library(BiocParallel)
BPPARAM=SnowParam(workers=5,progressbar = TRUE,type='SOCK')
funnn<-function(geneind,matrixx){
  return(mean(matrixx[geneind,]))
}

suppressWarnings(temp_result<-bplapply(seq(1,dim(matrixx)[1]),funnn,matrixx,BPPARAM=BPPARAM))

 

I prefer the progress bar shown in the foreach  case: increase the bar by 1% per time, so that I can have a basic idea about the running time of the whole code. In the second case, the progress bar increases suddenly. 

My question is how could I achieve the same progress bar as shown in foreach case using bplapply? 

 

Thank you very much!

Best wishes,

Wenhao

 

bplapply progressbar • 2.8k views
ADD COMMENT
5
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States

The effect can be achieved by setting the number of tasks, e.g.,

BPPARAM=SnowParam(workers=5, tasks = 20, progressbar = TRUE,type='SOCK')

updates the progress bar 20 times.

The way bplapply works is that, by default, it splits the initial task list (in your case the sequence of row indexes) into equal components for each worker -- each worker gets 10000 / 5 = 2000 rows. These are sent to the workers, who report back when done. When each worker finishes, the progress bar advances. The progress bar advances in 5 steps, but since the workers all finish at about the same time it seems like the progress bar jumps to complete.

The effect of setting tasks = 20 is to divide the 10000 tasks into 10000 / 20 = 500 rows per task, to send 500 x 5 to the first five workers, and as each worker finishes the progress bar is updated and the next 500 tasks sent to the worker. The progress bar moves across the screen more smoothly, but actually the computation is less efficient (because there is more communication between the manager and workers) and takes longer. If most of the time is spent in computation anyway, then the extra cost of communication is small and the trade-off may be worth it.

Usually of course it is better to vectorize than to parallelize, so in the above trivial example simply rowMeans(matrixx).

(the comment on your question was from a spammer, and was deleted).

 

ADD COMMENT
0
Entering edit mode

Thank you Martin! Is it possible to allow bplapply for passing arguments to the function txtProgressBar? If so then I can specify 'max=10000', so that progress bar will be element based.

For this toy example, rowMeans definitely works better. i just used it for illustration.

By the way, BiocParallel is very good, thank you for your work!

 

ADD REPLY
0
Entering edit mode

Under the current scheme, it will not help to make the progress bar element based, because it would be reporting progress on the workers, where no one is looking!

The current implementation does not allow progress bar options to be set; you could open an issue (no promises for an update, though), at https://github.com/Bioconductor/BiocParallel .

ADD REPLY
0
Entering edit mode

Picking up on this answer, is it possible to have bpapply show a progress bar similar to pbapply when using SerialCoreParam?

ADD REPLY
1
Entering edit mode

I'm not sure that I understand the question; this

> param = SerialParam(progress=TRUE)
> res = bplapply(1:10, function(i) Sys.sleep(1), BPPARAM=param)
  |======================================================================| 100%

works?

 

ADD REPLY

Login before adding your answer.

Traffic: 469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6