DESeq2 Parallel Computing Stall on Server
1
0
Entering edit mode
coyoung ▴ 10
@coyoung-17963
Last seen 4.9 years ago

I have a question regarding the use of parallel computing for DESeq2 on the server. I’ve been working with this code for the last few days & I haven’t been able to find a remedy. I have been using the DEseq2 online manual for RNA seq analysis but it seems the processing is taking longer than what I have been able to do on my personal computer.  Please let me know if there is anything else I need to send.

$ screen -S DESeq2


### DESeq2 Analysis in Server ###

setwd('/projects/home/cyoung304/data')

library('DESeq2')

cts <- read.csv(file='GeneName.csv')

nrow(cts)

ncol(cts)

colData <- read.csv(file='colData.csv')

ncol(colData)

nrow(colData)

rownames(cts) <- cts$Geneid

cts$Geneid <- NULL

library("BiocParallel")

register(MulticoreParam(12))

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition)

keep <- rowSums(counts(dds) >= 10) >= 5

dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref='NT')

# micheal said it doesnt matter what the refernce is b/c the comparision between the sample remains the same

ddsColl <- collapseReplicates(dds, dds$id)

ddsColl <- DESeq(ddsColl, fitType='local', parallel = TRUE)

resultsNames(ddsColl)

res <- results(dds, name="condition_NT_vs_MPT", lfcThreshold = 0.585, alpha=0.05)

res

resOrder <- res[order(res$padj),]

write.csv(as.data.frame(resOrder), file='DESeqResults.csv')
deseq2 • 1.1k views
ADD COMMENT
0
Entering edit mode

make sure that the basics are working first, e.g.,

register(MulticoreParam(2))
bplapply(1:2, sqrt)

If not check out the manager.hostname and manager.port arguments to MulticoreParam (this could be tricky, finding out what ports (if any) are open on the cluster.

ADD REPLY
0
Entering edit mode
library("BiocParallel")

register(MulticoreParam(2))

bplapply(1:2, sqrt)

[[1]]

[1] 1

[[2]]

[1] 1.414214
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

How many samples do you have? How long does it take with one core?

Due to the way that the parallel backends work, it's sometimes faster to run with fewer cores, as there is overhead in sending large datasets to 12 cores, and R will often eventually end up duplicating the memory (long backstory on this, which can be found on support site threads).

ADD COMMENT
0
Entering edit mode

I have 125 samples total (68 normal/57 condition). Without registering the cores at the beginning of the analysis I was able to complete DESeq2 in about 3.5 - 4 hrs.

ADD REPLY
0
Entering edit mode

The reason it is slow is because the design matrix is 125 x ~60 with every patient getting their own coefficient, which is pretty large, and the GLM needs to be iteratively solved for each gene. For these large datasets, I tend to use limma-voom which is much faster, because it avoids the need to iteratively solve for the coefficients with these large design matrices.

ADD REPLY

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6