Due to the increase in functionality in edgeR, I've become slightly confused about the correct order of calling the different functions, since there seems to be multiple ways of doing the same thing. Specifically, there are currently 3-4 ways of estimating dispersion:
1) 3-step method:
estimateGLMCommonDisp
estimateGLMTrendedDisp
estimateGLMTagwiseDisp
2) All-in-one method:
estimateDisp
3) Robust method
estimateGLMRobustDisp
4) QL
estimateDisp
glmQLFit
The estimateDisp
function help file says it's similar to calling estimateGLMCommonDisp
> estimateGLMTrendedDisp
> estimateGLMTagwiseDisp
, but also that it gives slightly different results. What's the difference between these two approaches, and is estimateDisp
always the preferred?
The estimateDisp
also has an argument called robust
, which is suggested to be set to TRUE
in the edgeR user guide. Is running estimateDisp(robust=TRUE)
similar to running estimateGLMRobustDisp
?
To make the confusion total, there's the separate approach of using the QL approach instead. The manual suggests using estimateDisp(robust=TRUE)
, even glmQLFit
re-estimates the dispersions. Again, glmQLFit
also has an argument called robust
. In addition, it has an argument called abundance.trend
(which I have not seen mentioned in any manuals), should this ever be set to TRUE
?
What's considered a good rule of thumb for running the different implementations? Something like below:
edgeR:
estimateDisp(robust=TRUE)
glmFit
glmLRT
edgeR robust:
estimateGLMRobustDisp
glmFit
glmLRT
edgeR QL:
estimateDisp(robust=TRUE)
glmQLFit(robust=TRUE)
glmQLFTest
Any advise is much appreciated!
Thank you very much for the detailed answer! I'm glad I was on the right track. With regards to edgeR robust, how does it compare to using limma+voom with array weights? Both seem to involve around dampening the effect of outliers, but seem to have a different focus, with limma+voom weighting samples vs and edgeR robust weighting features, respectively.
Array weights are focused on dealing with outlier samples, whereas the various robustness functions in edgeR deal with outlier observations in each feature, either by downweighting the observations or the entire feature. Depends on what you want to do; you use information across all features to estimate the array weight for each sample, so it's pretty reliable, but they won't provide protection when the outlier observations are stochastically distributed across many samples.
To follow up, is it reasonable to use the QL pipeline with the
estimateGLMRobustDisp
? Thanks.It's not something I've ever tried, but I suppose it would be okay.
estimateGLMRobustDisp
reports trended dispersions after robustness weighting, which could then be used byglmQLFit
, etc. So there's no theoretical reason against using them together, though I should note that currently theestimateGLMRobustDisp
function relies on the oldestimateGLM*Disp
functions rather than the new and improvedestimateDisp
.Would the QL pipeline actually use the computed robustness weights, or would it just use the dispersion trend that was estimated robustly but ignore the weights themselves?
Yes, the weights computed by
estimateGLMRobustDisp
will also be used in downstream GLM fitting and deviance estimation (which is the prelude to QL dispersion estimation). If one accepts that it was okay to use the weights in the GLM fitting for NB dispersion estimation inestimateGLMRobustDisp
, then it's probably also fine to use the weights in the GLM fitting for QL dispersion estimation inglmQLFit
, too.