Hi all,
Currently I am using emptyDrops() for calling cells after applying swappedDrops() to perform barcode swapping removal of several 10X scRNA datasets. After experiencing long running times for emptyDrops() on these data, I was wondering about the scalability of the function's performance.
As mentioned in [1], emptyDrops() requires approximately 1-2 minutes to run on each of the tested datasets, and this was confirmed with example dataset "placenta1" (dimension of 33,694 features x 737,280 barcodes), which took around 75 seconds to complete. However, when using emptyDrops() on my datasets of interest, this is taking much longer than expected, eg. for a dataset of dimension 33,538 features x 737,280 barcodes (ie. a total of 156 fewer genes), the running time is around 245 seconds. When attempting to clarify this difference, I also looked at the degree of "sparsity" of each dataset, and while "placenta1" has 18,763,564 non-zero elements, my example dataset had 10,793,183. Do any of this factors (dimensions, sparsity, etc.) or others influence the running time of emptyDrops()? Is it possible to reduce it somehow? I apply this function several times over several datasets, therefore my interest in the matter.
Thank you in advance!
Thank you very much for your detailed response, and apologies about the very very late reply... I was delaying this answer until having some feedback on usage of this
BPPARAM
parameter, however, in the end I have not had the time yet to test and play around with it. Nonetheless, I plan to do so in the near future, hence you can expect to hear back from me again.Thank you very much once more for all your valuable help!