Entering edit mode
The final step of DESeq2 dispersion estimation takes a very long time to run on a dataset with 27 groups . I was wondering if there is a good strategy for speeding it up.
The final step of DESeq2 dispersion estimation takes a very long time to run on a dataset with 27 groups . I was wondering if there is a good strategy for speeding it up.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for the suggestions.
The issue I have is that I have 27 groups with only 3 replicates. Because the number of replicates is low I would prefer a count based method for hypothesis testing. The data is also quite noisy so I would like to be able to run the analysis repeatedly with different normalization strategies which is why speed is important.
Is there a way to use a less complex model for the dispersion estimation (treating some samples as replicates) but still obtain the coefficients for the full model? It doesn't seem like the current workflow allows for this.
That's 81 samples total, which is plenty for voom to work quite well. Even though each group has only 3 replicates, keep in mind that DESeq2, edgeR, and limma-voom all estimate a single dispersion or variance parameter for each gene shared across all groups, so no matter which one you use, you are estimating the dispersion/variance from all 81 samples, giving you a quite robust estimate.
Thanks, voom does work well and fast. I was wondering how I may combine it with a normalization for technical parameters such as gc bias, length bias. I have been using the CQN generated glm.offsets to do this before.
If you use voom, and have further questions, I'd recommend starting a new post with new tags. The posts send emails to the authors based on tags, so if you tag with limma, you will get responses from the package authors.