gof assumes that the total residual deviance of the fitted GLM follows a chi-squared distribution on the residual degrees of freedom. This is not the case in the quasi-likelihood framework where the deviance is scaled by a gene-specific QL dispersion. Thus, it doesn't seem appropriate to apply gof on f2 - at the very least, you shouldn't compare gof results between f1 and f2, because they aren't really comparable. In general, I would predict that the QL fit will look "worse", but that's because gof doesn't account for presence of the QL dispersion in the model.
glmQLFit uses the trended NB dispersion to fit a GLM, and then estimates the QL dispersion from the deviance. It doesn't re-estimate the NB dispersion but instead estimates an entirely different QL dispersion for each gene. In contrast, glmFit uses the tagwise NB dispersion to fit a GLM - and that's it.
You probably won't be able to tell from the gof results. What I can tell you is that glmQLFit will provide more accurate type I error rate control as it accounts for the uncertainty of the dispersion estimates in a more rigorous manner than the glmFit pipeline. As such, I use it routinely for my analyses.
Well, I'm not sure that it makes all that much sense to do so. Any irregularity in the initial GLM fit will manifest as a change in the distribution of the deviances, but such changes will end up being modelled by glmQLFit via estimation of f2$var.prior or f2$df.prior. Any (deviance-based) goodness-of-fit testing that includes the QL dispersion would effectively cancel itself out, because the purpose of the QL dispersion is to model any "non-goodness of fit", so to speak.
Aaron is right. You can't do a goodness of fit plot for a QL model fit -- the whole concept of goodness of fit doesn't apply because QL automatically fits dispersions at the gene level.
The goodness of fit plot done by gof() is actually only useful for evaluating models glm models with constant or trended dispersion. The plot is designed to evaluate whether these global dispersion models are adequate --- if not then tagwise dispersions are required. The plot doesn't make sense for a glm tagwise dispersion model and it isn't even defined for a QL model.
So the gof() plot can only be used to decide whether a global dispersion model is adequate. It can't be used to choose between glm-tagwise vs a QL model. By definition, both of these models are adequate from a gof() point of view.
As Aaron has already explained, if you are seriously worried about assumptions, then QL is an obvious choice. It is based on a more conservative model and makes fewer assumptions than does glmLRT().
I have to admit that the gof() help page doesn't explain this properly.
Thank you very much for the details. glmQLFTest seems too conservative for my data because the BH FDR values all equal to 1. Then I would know whether or not glmLRT may be used instead of glmQLFTest.
Well, I think that we've already explained how to decide. If you want to be conservative and safe, use QL. If you want to be a bit more aggressive and find all the DE genes you can, still with good FDR control, then use glmLRT. You have already said that you prefer the latter, so use glmLRT.
Thanks Aaron. I see gof only works for glmFit. How can I do goodness of fitting test for glmQLFit?
Changjiang
Well, I'm not sure that it makes all that much sense to do so. Any irregularity in the initial GLM fit will manifest as a change in the distribution of the deviances, but such changes will end up being modelled by
glmQLFit
via estimation off2$var.prior
orf2$df.prior
. Any (deviance-based) goodness-of-fit testing that includes the QL dispersion would effectively cancel itself out, because the purpose of the QL dispersion is to model any "non-goodness of fit", so to speak.Aaron is right. You can't do a goodness of fit plot for a QL model fit -- the whole concept of goodness of fit doesn't apply because QL automatically fits dispersions at the gene level.