Speeding up large scale scRNAseq analyses in R and improve memory
1
0
Entering edit mode
Maximilian • 0
@82f828a6
Last seen 14 months ago
Germany

Hi everyone,

My apologies if the question is rather broad, but I am looking for a general solution. I am analysing several datasets in the ballpark of 500k cells, and possibly would like to integrate them.

However, analysing even one of these datasets takes days even to finish. My attempts in speeding up the analyses included BiocParallel, approximate methods (i.e. irlba for PCA) in scran, and Seurat v5 along with future parallelisation (I know, not Bioconductor packages). But still, the analyses take several workdays and partially crash due to memory limits (working on an HPC cluster with 128GB memory, 300 GB for R).

In Python, the preprocessing and analyses do not consume too much memory, and time wise most of the analyses take only 2-3 hours from start to finish, and therefore allow some interactivity if I need to figure things out.

With all due respect to Python, when it comes to bioinformatics analyses, it just does not possess the required ecosystem of packages and is second to R in scientific analyses.

So therefore I would like to ask for some pointers of how to speed up preprocessing and analyses in R (am rather new to the topic).

Already thank you for any help

scran • 720 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 3 hours ago
Germany

Since you're not adding any specific code sections or functions but say that python is fine for some steps, why not using reticulate to run preprocessing via python and them seamlessly import it i to R for downstream analysis? See also the OSCA book from Bioconductor on analysis of big data. http://bioconductor.org/books/release/OSCA.advanced/dealing-with-big-data.html

ADD COMMENT

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6