Question

Bypass dispersion estimation in DESeq2

0

Entering edit mode

stevenn.volant • 0

@stevennvolant-9599

Last seen 7.2 years ago

Hello,

I use the DESeq2 package for many projects in which the number of samples is quite small. But, sometimes i have some projects with almost 700 samples and i want to do the same kind of analysis for all the projects.

When the number of samples is huge the estimation of the dispersion is very time consuming. In this specific case, i think that using the estimation providing by the parametric or local regression is not required, the gene-wise dispersion can be used (or something close to this estimation).

So my question is, is there a way to by pass the estimation and directly use the gene-wise dispersion ?

Best

deseq2 • 1.1k views

ADD COMMENT • link updated 7.4 years ago by Michael Love 43k • written 7.4 years ago by stevenn.volant • 0

score 1 · Answer 1 · 2017-11-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

hi Stevenn,

Yes, you are right that the MAP will be nearly the same as the MLE with hundreds of residual degrees of freedom. To use the MLE instead of calculating the MAP you can use:

dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
dds <- nbinomWaldTest(dds) # or nbinomLRT()

But also, aside from the above, you can use some simple wrappers to parallelize many steps in DESeq2. This is what occurs within DESeq() when you use parallel=TRUE, but I'm trying to clean up the function now so it's easier for users to copy the code and do it on their own. There may be some better techniques for memory management which I'm actively exploring.

The above steps can be distributed to any number of cores, e.g.,

idx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds))))
dds.list <- lapply(levels(idx), function(l) dds[idx==l,])
dds.list <- bplapply(dds.list, function(d) {
  # some code to run on 'd'
  # a function that should NOT go here is estimateDispersionsFit()
  # ...that function needs to see all of the rows of 'dds' together
})
dds <- do.call(rbind, dds.list)

ADD COMMENT • link 7.4 years ago Michael Love 43k

0

Entering edit mode

If this is becoming a frequent enough use case, could the above (first code chunk) be exposed as an option in the DESeq() wrapper?

ADD REPLY • link 7.4 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

That's a reasonable suggestion. DESeq2 then reduces to MLE dispersion with the GLM, which you could just as well get with e.g. glm.nb() with log of size factors as offset, but on the other hand you then have an object that can be input to lfcShrink() to get moderated fold changes, which is novel.

ADD REPLY • link 7.4 years ago Michael Love 43k

0

Entering edit mode

While we’re at it, I think we should retire the automatic “no replicates” work-around, as the warning and wording surrounding this is already so strongly discouraging. Users can always do this manually, but we would no longer be facilitating.

ADD REPLY • link 7.4 years ago Michael Love 43k