Hello,
I use the DESeq2 package for many projects in which the number of samples is quite small. But, sometimes i have some projects with almost 700 samples and i want to do the same kind of analysis for all the projects.
When the number of samples is huge the estimation of the dispersion is very time consuming. In this specific case, i think that using the estimation providing by the parametric or local regression is not required, the gene-wise dispersion can be used (or something close to this estimation).
So my question is, is there a way to by pass the estimation and directly use the gene-wise dispersion ?
Best
If this is becoming a frequent enough use case, could the above (first code chunk) be exposed as an option in the
DESeq()
wrapper?That's a reasonable suggestion. DESeq2 then reduces to MLE dispersion with the GLM, which you could just as well get with e.g. glm.nb() with log of size factors as offset, but on the other hand you then have an object that can be input to lfcShrink() to get moderated fold changes, which is novel.
While we’re at it, I think we should retire the automatic “no replicates” work-around, as the warning and wording surrounding this is already so strongly discouraging. Users can always do this manually, but we would no longer be facilitating.