estimateDisp runs forever but not the trio estimateGLM
1
0
Entering edit mode
Daniel ▴ 10
@daniel-6619
Last seen 14 months ago
Finland

Hello,

 

I get this "y = estimateDisp(y, design, robust = TRUE)" running forever even that the trio estimateGLM runs just fine (and I am able to finish the DE analysis with them and using glmFit and glmLRT).

Therefore, is it safe to replace "y = estimateDisp(y, design, robust = TRUE)" with the trio estimateGLM when using glmQLFit and glmQLTest?

 

Cheers,

Daniel

edger estimateDisp • 1.6k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 8 hours ago
The city by the bay

That's strange, estimateDisp should be faster than the trio. Also, most of the GLM fitting code is shared, so it's odd that one would run forever and the others would be faster. Here's a couple of things to check:

  • Are you using the latest version of R/Bioconductor/edgeR?
  • Make sure that there are no libraries with zero library sizes/non-finite offsets.
  • Have you filtered out low abundance genes?

If you can, call debug(estimateDisp) and step through the function until you get to the part that stalls; this would be helpful for us to figure out what's going on.

As for your other question, we would prefer that you use estimateDisp rather than the trio, as the former is more up-to-date. See C: edger, trended or common dispersion for more details.

ADD COMMENT
0
Entering edit mode

> That's strange, estimateDisp should be faster than the trio.

In my case is the other way around. estimateDisp runs ~30 minues while the trio estimateGLM run in less than 2 minutes.

> it's odd that one would run forever and the others would be faster

I agree.

> Are you using the latest version of R/Bioconductor/edgeR?

I use edgeR_3.12.0

> Make sure that there are no libraries with zero library sizes/non-finite offsets.

I checked and this is not the case.

> Have you filtered out low abundance genes?

Of course.

> If you can, call debug(estimateDisp) and step through the function until

> you get to the part that stalls; this would be helpful for us to figure out what's going on.

I need to look into this.

As, I have stated in my previous post if I use the "old" approach (i.e. trio of estimateGLM, glmFit and glmLRT) everything goes fine, quickly, and smoothly. On same data (and same contrasts and same designs and same filtering) if I switch to QLF approach (i.e. estimateDisp, glmQLFit and glmQLTest) it goes fine except that "y = estimateDisp(y, design, robust = TRUE)" takes ~30 minutes. Indeed I have a very complex design (e.g. several time points, several batches, several treatments, several controls, etc.) and therefore the filtering step (for low counts) cannot be very effective as in cases with one treatment versus one control (which might lead to having PARTIALLY low-count genes for some groups of samples or time points).

ADD REPLY

Login before adding your answer.

Traffic: 782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6