Hi everyone,
My apologies if the question is rather broad, but I am looking for a general solution. I am analysing several datasets in the ballpark of 500k cells, and possibly would like to integrate them.
However, analysing even one of these datasets takes days even to finish. My attempts in speeding up the analyses included BiocParallel, approximate methods (i.e. irlba for PCA) in scran, and Seurat v5 along with future parallelisation (I know, not Bioconductor packages). But still, the analyses take several workdays and partially crash due to memory limits (working on an HPC cluster with 128GB memory, 300 GB for R).
In Python, the preprocessing and analyses do not consume too much memory, and time wise most of the analyses take only 2-3 hours from start to finish, and therefore allow some interactivity if I need to figure things out.
With all due respect to Python, when it comes to bioinformatics analyses, it just does not possess the required ecosystem of packages and is second to R in scientific analyses.
So therefore I would like to ask for some pointers of how to speed up preprocessing and analyses in R (am rather new to the topic).
Already thank you for any help