How to run runNMF in parallel?
1
0
Entering edit mode
p.joshi ▴ 40
@pjoshi-22718
Last seen 2.5 years ago
Germany

Hi,

I am trying to use the new runNFM() from scater. I have two questions.

First, I am trying to perform first reduce dimension using NMF rather than PCA, just to test it if it better represent the data, which I would obviously plot using UMAP. So, I am trying to get 50 NMF components rather than just. Do you think I am doing something conceptually wrong?

Because I am trying to get 50 NMF components, it is running for very long. So, I tried to use multiple cores. As it uses nmf function, where parallel computations can be specified using .opt = 'p6' or .pbackend= 8, which uses doParallel package, I am still not able to use multiple cores. How can I fix that? I did call registerDoParallel(cl=10). Still no speed up.

Thanks!

Piyush

scater single SingleCellExperiment • 3.3k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

I suspect the parallel mode only works when you are requesting multiple runs - each run seems to be assigned to a single core. If you only have one run, it doesn't appear to parallelize _within_ that run.

To illustrate:

library(Biobase)
data(esGolub)
out <- nmf(esGolub, 3, nrun=5, .opt='vp8')
## NMF algorithm: 'brunet'
## Multiple runs: 5
## # NOTE - CRAN check detected: limiting maximum number of cores [2/12]
## Mode: parallel (2/12 core(s))
## Runs: |==================================================| 100%
## System time:
##    user  system elapsed 
##   6.014   0.561   7.831

Running top at the same time indicates that, indeed, multiple R processes have been started up. But if we remove the nrun= argument:

out <- nmf(esGolub, 3, .opt='vp8')
## NMF algorithm: 'brunet'
## NMF seeding method: random
## Iterations:  600/2000 
## DONE (converged at 610/2000 iterations)

Note the lack of any commentary about parallelization in the latter.

Personally, I found NMF (or at least, the implementation in NMF) too slow to serve as a routine dimensionality reduction technique. The main appeal of NMF lies in the interpretability of the factors but that wasn't something I really cared about for my analyses, so I just stuck to PCA. I should mention that scater originally used the NNLM package, which was much faster; unfortunately, that got kicked off CRAN.

ADD COMMENT
0
Entering edit mode

Thanks once again Aaron. The reason I am trying to test NMF is because various papers suggesting using NMF for integrating multimodal data.

ADD REPLY
0
Entering edit mode

Check out LIGER. It’s a great NMF-based method for integrating multimodal data. It’s also reasonably scalable. Expect a comparable and faster implementation of integrative NMF in the RcppML package in the next month or so.

ADD REPLY
0
Entering edit mode

Thanks a lot for providing an NMF implementation in scater, Aaron! If you are still interested in a fast (and potentially more reproducible) NMF implementation, perhaps the RcppML R package is worth a look? The authors have a nice preprint illustrating the improvements they made. (Code to reproduce their figures is in the supplements.)

ADD REPLY
0
Entering edit mode

Seems like it would be a good request/PR for the scater repo.

ADD REPLY
0
Entering edit mode

Author/maintainer of RcppML here. Please share any feature requests/suggestions that would make RcppML NMF more useful for Bioconductor packages. I’m committed to making this the fastest, most painless, robust, and flexible NMF implementation out there.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6