I am trying to run DESeq2 with many samples (single-cell experiment). However, it seems to be choking on the amount of data. When I run DESeq() with parallel=TRUE, I quickly get a memory error (this does not happen for me with smaller experiments). If I run it with parallel=F, I think it just stalls, since it does not finish for over a day. Is there any way around that? Are there some other parameters I can adjust?
My typical recommendation when users have 100s of biological replicates within each condition is to use limma + voom, which runs very quickly. In our testing, the differences between our method and linear methods like limma or non-parametric methods like SAMseq disappeared as the number of biological replicates within each condition grew large (as you would expect).
In regards to the memory problem: I did learn in an earlier support thread that, if you run BiocParallel (parallel=TRUE), it can benefit to first clear the R environment of any large data objects which are not needed. R's internal memory garbage collector, gc(), may be making copies of the large objects in the environment in all of the workers.
I will look into limma + voom. Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2? I still like the rest of the DESeq functions and how they behave.
No, you can't split an analysis to be done by a mix of voom and DESeq2. Which functions from DESeq2 are you referring to when you say "I still like the rest of the DESeq functions and how they behave"?
We could show you what the equivalent moves would be on the voom/limma side given the DESeq2 mojo you want to replicate.
Well you could try my suggestion regarding cleaning up the environment before running DESeq().
Note that single cell RNA-seq is expected to have peculiar distributional properties unlike the bulk RNA-seq experiments which DESeq2 was designed for. I'm not an expert on which packages are best for single cell analysis. At the least, you should be looking at the data a lot by eye, e.g. with plotCounts().
"Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2"
Not really. The data objects are structured differently. Some functions, like plotMA() in DESeq2, are just directly calling a function plotMA() defined in another Bioconductor package, geneplotter. This function takes a data.frame as input, see the help page for ?plotMA in geneplotter. Other of our plotting functions are quite simple and you can tweak the code yourself if you are familiar with R.
I am also using a single-cell specific package, but it requires normalized values as input, which is why I still need to use a more traditional package first.
I will look into limma + voom. Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2? I still like the rest of the DESeq functions and how they behave.
No, you can't split an analysis to be done by a mix of voom and DESeq2. Which functions from DESeq2 are you referring to when you say "I still like the rest of the DESeq functions and how they behave"?
We could show you what the equivalent moves would be on the voom/limma side given the DESeq2 mojo you want to replicate.
I have not used voom, so I am not sure how many issues I will run into.
One example would be
varianceStabilizingTransformation
, which I believe is DESeq2-specific.Well you could try my suggestion regarding cleaning up the environment before running DESeq().
Note that single cell RNA-seq is expected to have peculiar distributional properties unlike the bulk RNA-seq experiments which DESeq2 was designed for. I'm not an expert on which packages are best for single cell analysis. At the least, you should be looking at the data a lot by eye, e.g. with plotCounts().
"Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2"
Not really. The data objects are structured differently. Some functions, like plotMA() in DESeq2, are just directly calling a function plotMA() defined in another Bioconductor package, geneplotter. This function takes a data.frame as input, see the help page for ?plotMA in geneplotter. Other of our plotting functions are quite simple and you can tweak the code yourself if you are familiar with R.
https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R
I am also using a single-cell specific package, but it requires normalized values as input, which is why I still need to use a more traditional package first.