Question

DESeq2 with many samples

1

Entering edit mode

igor ▴ 50

@igor

Last seen 20 months ago

United States

I am trying to run DESeq2 with many samples (single-cell experiment). However, it seems to be choking on the amount of data. When I run DESeq() with parallel=TRUE, I quickly get a memory error (this does not happen for me with smaller experiments). If I run it with parallel=F, I think it just stalls, since it does not finish for over a day. Is there any way around that? Are there some other parameters I can adjust?

deseq2 • 6.3k views

ADD COMMENT • link updated 9.3 years ago by Michael Love 43k • written 9.3 years ago by igor ▴ 50

score 3 · Answer 1 · 2015-10-15

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 hours ago

United States

My typical recommendation when users have 100s of biological replicates within each condition is to use limma + voom, which runs very quickly. In our testing, the differences between our method and linear methods like limma or non-parametric methods like SAMseq disappeared as the number of biological replicates within each condition grew large (as you would expect).

In regards to the memory problem: I did learn in an earlier support thread that, if you run BiocParallel (parallel=TRUE), it can benefit to first clear the R environment of any large data objects which are not needed. R's internal memory garbage collector, gc(), may be making copies of the large objects in the environment in all of the workers.

C: Memory usage for bplapply -- is the entire environment copied over to each worke

ADD COMMENT • link 9.3 years ago Michael Love 43k

0

Entering edit mode

I will look into limma + voom. Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2? I still like the rest of the DESeq functions and how they behave.

ADD REPLY • link 9.3 years ago igor ▴ 50

0

Entering edit mode

No, you can't split an analysis to be done by a mix of voom and DESeq2. Which functions from DESeq2 are you referring to when you say "I still like the rest of the DESeq functions and how they behave"?

We could show you what the equivalent moves would be on the voom/limma side given the DESeq2 mojo you want to replicate.

ADD REPLY • link 9.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I have not used voom, so I am not sure how many issues I will run into.

One example would be varianceStabilizingTransformation, which I believe is DESeq2-specific.

ADD REPLY • link 9.3 years ago igor ▴ 50

0

Entering edit mode

Well you could try my suggestion regarding cleaning up the environment before running DESeq().

Note that single cell RNA-seq is expected to have peculiar distributional properties unlike the bulk RNA-seq experiments which DESeq2 was designed for. I'm not an expert on which packages are best for single cell analysis. At the least, you should be looking at the data a lot by eye, e.g. with plotCounts().

"Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2"

Not really. The data objects are structured differently. Some functions, like plotMA() in DESeq2, are just directly calling a function plotMA() defined in another Bioconductor package, geneplotter. This function takes a data.frame as input, see the help page for ?plotMA in geneplotter. Other of our plotting functions are quite simple and you can tweak the code yourself if you are familiar with R.

https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R

ADD REPLY • link 9.3 years ago Michael Love 43k

0

Entering edit mode

I am also using a single-cell specific package, but it requires normalized values as input, which is why I still need to use a more traditional package first.

ADD REPLY • link 9.3 years ago igor ▴ 50