Question

edger, trended or common dispersion

0

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 8.1 years ago

United Kingdom

I am bit in doubts if I should use estimateGLMCommonDisp(y,design) or estimateGLMTrendedDisp(y,design). There is however a difference in DE genes, not a big one but still. How should the ideal plots look like?
My BCV plots are following:

Trended dispersion estimated:

and only common dispersion:

edger • 8.0k views

ADD COMMENT • link updated 8.4 years ago by David ROUX ▴ 20 • written 9.0 years ago by tonja.r ▴ 80

0

Entering edit mode

David ROUX ▴ 20

@david-roux-11055

Last seen 5.8 years ago

France (Avignon University)

In my case, if I use "estimateDisp" instead of the other functions I get :

"Error in locfitByCol(loglik, covariate, span = span, degree = 0) : 
  locfit required but is not available"

And, I still do not realy know how to interpret the plot.

I really like the funny sentence of Aaron : "a nice decreasing shape " :-)

But what to expect objectively ?

Here are my script and plot...

y <- estimateCommonDisp(y, design)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
plotBCV(y)

ADD COMMENT • link 8.4 years ago David ROUX ▴ 20

0

Entering edit mode

In the future, if you have a new question, please post a new question instead of posting it as an answer to another question. But to answer, that error message is pretty clearly telling you that you haven't installed the locfit package and should do so in order to use estimateDisp.

ADD REPLY • link 8.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ok, thanks for your response. David.

ADD REPLY • link 8.4 years ago David ROUX ▴ 20

score 3 · Accepted Answer · 2015-12-14

3

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 7 weeks ago

Icahn School of Medicine at Mount Sinai…

It seems pretty clear that there is a significant trend in your dispersions, so I would recommend using trended dispersion. However, you generally should just use estimateDisp instead of estimateGLMCommonDisp or estimateGLMTrendedDisp.

ADD COMMENT • link 9.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Why should I use estimateDisp? In manual it is stated that I can alternatively use another functions:

y <- estimateDisp(y, design)

Alternatively, one can use the following calling sequence to estimate them one by one. To estimate common dispersion:

y <- estimateGLMCommonDisp(y, design)

To estimate trended dispersions:

y <- estimateGLMTrendedDisp(y, design)

To estimate tagwise dispersions:

y <- estimateGLMTagwiseDisp(y, design)

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

3

Entering edit mode

Because estimateDisp provides protection against zero fitted values, gives a more stable likelihood landscape (due to hot-starting at every step of the grid search), gives a more graduated trend from local fitting compared to binning, and can estimate the necessary prior degrees of freedom for EB shrinkage from the data. Oh, and it's under active development. I think that we're intending for estimateDisp to subsume the functionality of the other estimation functions... eventually.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

If I use estimateDisp on another dataset instead of those three commands, I get following BCV plot:

and if I use those three commands instead, I get:

So, the first pic produced by estimateDisp seemed a bit strange to me, so I decided to avoid this function and use the three command instead. What do you think?

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

1

Entering edit mode

There's no problem. As Aaron explained, the newer pipeline uses the data to estimate the prior degrees of freedom for the tagwise dispersions. The older pipeline simply uses 10 prior df unconditionally unless you tell it to use some other value. If you look at your DGEList object after running estimateDisp, it probably has a very large or even infinite prior df, indicating that there is insufficient evidence of gene-specific dispersions in your dataset, so that each gene simply uses the trend instead of getting its own dispersion. This is likely to happen if you have very few samples and/or low sequencing depth. In either case, the estimateDisp result is more honest about what information you do and do not have about each gene's dispersion.

ADD REPLY • link 9.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

But what does it mean regarding the DE genes then if my BCV looks like the first picture?

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

0

Entering edit mode

I'm not sure what you're asking. In general, higher prior d.f. means that you are able to share more information across genes. This improves the precision of the dispersion estimates and increases power to detect DE genes. There's no cause for alarm based on this plot; the mean-dispersion trend has a nice decreasing shape and the values on the y-axis are fairly low. The fact that every dispersion is shrunk to the trend just means that the genes in your data have dispersions that don't vary much from the trend. The only extra thing I can think of is to set robust=TRUE, to protect against overzealous shrinkage of outlier dispersions.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Just a small question regarding those functions. The RUVseq manual uses following:
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
Also, DiffBind does not use estimateDisp function in the edgeR analysis.
What is the motivation that they use the functions separately then?

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

0

Entering edit mode

The code for those packages was likely written before the estimateDisp function was introduced into edgeR.

ADD REPLY • link 9.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Would it be advisable to use glmQLFit over glmFit as I have only two replicates per condition?

ADD REPLY • link 8.9 years ago tonja.r ▴ 80

0

Entering edit mode

That's fine. The whole point of information sharing via EB shrinkage (of the QL or NB dispersions) is to overcome problems due to low numbers of replicates. Obviously, though, the more replicates the better.

ADD REPLY • link 8.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

As far as I understood the main difference between QL and NB dispersions is that QL accounts for gene-specific variability, meaning the more replicates the better. If I have only two replicates, there is no point to use QL, right? Or when one should use QL and when NB?

ADD REPLY • link 8.9 years ago tonja.r ▴ 80

2

Entering edit mode

EB shrinkage of both QL and NB dispersions can account for gene-specific variability. However, the QL approach accounts for the uncertainty of the estimates in a more statistically rigorous manner. This ensures that type I error control is maintained during testing. In contrast, shrinkage of the NB dispersions is a bit less rigorous; the NB distribution doesn't have a conjugate prior for the dispersion parameter, and the uncertainty of dispersion estimation is difficult to model.

Besides, you can still have gene-specific variability with two replicates. If one gene is variable between the two replicates, and another is not variable, then - behold - you have gene-specific variability.

ADD REPLY • link 8.9 years ago Aaron Lun ★ 28k