Question

estimateDisp runs forever but not the trio estimateGLM

0

Entering edit mode

Daniel ▴ 10

@daniel-6619

Last seen 18 months ago

Finland

Hello,

I get this "y = estimateDisp(y, design, robust = TRUE)" running forever even that the trio estimateGLM runs just fine (and I am able to finish the DE analysis with them and using glmFit and glmLRT).

Therefore, is it safe to replace "y = estimateDisp(y, design, robust = TRUE)" with the trio estimateGLM when using glmQLFit and glmQLTest?

Cheers,

Daniel

edger estimateDisp • 1.7k views

ADD COMMENT • link updated 9.2 years ago by Aaron Lun ★ 28k • written 9.2 years ago by Daniel ▴ 10

score 1 · Answer 1 · 2016-02-10

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 4 hours ago

The city by the bay

That's strange, estimateDisp should be faster than the trio. Also, most of the GLM fitting code is shared, so it's odd that one would run forever and the others would be faster. Here's a couple of things to check:

Are you using the latest version of R/Bioconductor/edgeR?
Make sure that there are no libraries with zero library sizes/non-finite offsets.
Have you filtered out low abundance genes?

If you can, call debug(estimateDisp) and step through the function until you get to the part that stalls; this would be helpful for us to figure out what's going on.

As for your other question, we would prefer that you use estimateDisp rather than the trio, as the former is more up-to-date. See C: edger, trended or common dispersion for more details.

ADD COMMENT • link 9.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

> That's strange, estimateDisp should be faster than the trio.

In my case is the other way around. estimateDisp runs ~30 minues while the trio estimateGLM run in less than 2 minutes.

> it's odd that one would run forever and the others would be faster

I agree.

> Are you using the latest version of R/Bioconductor/edgeR?

I use edgeR_3.12.0

> Make sure that there are no libraries with zero library sizes/non-finite offsets.

I checked and this is not the case.

> Have you filtered out low abundance genes?

Of course.

> If you can, call debug(estimateDisp) and step through the function until

> you get to the part that stalls; this would be helpful for us to figure out what's going on.

I need to look into this.

As, I have stated in my previous post if I use the "old" approach (i.e. trio of estimateGLM, glmFit and glmLRT) everything goes fine, quickly, and smoothly. On same data (and same contrasts and same designs and same filtering) if I switch to QLF approach (i.e. estimateDisp, glmQLFit and glmQLTest) it goes fine except that "y = estimateDisp(y, design, robust = TRUE)" takes ~30 minutes. Indeed I have a very complex design (e.g. several time points, several batches, several treatments, several controls, etc.) and therefore the filtering step (for low counts) cannot be very effective as in cases with one treatment versus one control (which might lead to having PARTIALLY low-count genes for some groups of samples or time points).

ADD REPLY • link 9.2 years ago Daniel ▴ 10