I am starting to look into BiocParallel, probably later than I should.
I am on a 64 core node. As far as I know, I have done nothing except load GenomicFiles. If I do
registered() $MulticoreParam class: MulticoreParam; bpisup: TRUE; bpworkers: 64; catch.errors: TRUE setSeed: TRUE; recursive: TRUE; cleanup: TRUE; cleanupSignal: 15; verbose: FALSE $SnowParam class: SnowParam; bpisup: FALSE; bpworkers: 64; catch.errors: TRUE cluster spec: 64; type: PSOCK $BatchJobsParam class: BatchJobsParam; bpisup: TRUE; bpworkers: NA; catch.errors: TRUE cleanup: TRUE; stop.on.error: FALSE; progressbar: TRUE $SerialParam class: SerialParam; bpisup: TRUE; bpworkers: 1; catch.errors: TRUE
If I understand it correctly, I now have 4 registered parallel backends (without doing anything) and the default is multicore. I think it is highly problematic for multi-user systems that the default is selected in this way. Specifically, in this case I have not requested 64 cores from my scheduler. Instead, I believe the default parallel backend should always be serial, and that we need to have user intervention to do more.
In line with this - and wearing my admin cap for this paragraph - I think it would be pretty convenient if it is possible to modify the default choices and settings using environment variables. This way, suitable choices can be made for some users in a multi-user environment, based on scheduling requests. For example, I would like to write something in .Rprofile.site which sets the default number of cores in a MulticoreParams, not based on cores-in-machine, but on cores-in-scheduling-request.
Also, I don't understand that the SnowParams is different from what I see with
> SnowParam() class: SnowParam; bpisup: FALSE; bpworkers: 0; catch.errors: TRUE cluster spec: 0; type: PSOCK
Thanks for the links to options. Regarding SnowParam(), it seems to be the only XXParam() where XXParam() does not give me the same as is already registered. Note that I did not do anything to register anything; these were all defaults that appeared.
The issue with the default choice of parallel routine is the following. Now (and in the future) we want to move increasingly to using bpapply and friends. Hopefully the long term impact of BiocParallel will be for developers to use bpapply anytime they (now) do lapply and it involves more than a basically instantaneous computation. This means that a larger set of operations in Bioconductor packages will be automatically parallized, even if the user is unaware. While I love multicore and friends, I note that I have had several instances both on a private machine and on a cluster node, where aggressive use of multicore has crashed the machine. Here I am particular concerned about unsophisticated new users.
In line with this, it looks to me that we do not have a way of enforcing feedback to the user regarding parallelization. I think it might be nice to have something like verbose=TRUE/FALSE in the some system settings which would entail user feedback whenever these parallel routines are used. I could not see this when looking briefly (we could also have verbose levels, so setting verbose to an integer >1 means even more details - this has been very useful to me, in my work). But perhaps I should get some more experience with the package first.
I'll update my answer with the following -- SnowParam() and the registered default are the same; there's now a better (?) mechanism to read defaults from Rprofile; the default registrations use at most 8 cores.