I am using the (robust) QL pipeline in edgeR with following lines of code
> estimateDisp
> glmQLFit
I understand that with this pipeline, only the NB trended dispersion is used from the estimateDisp function, and that glmQLFit estimates QL gene-wise dispersions using the empirical Bayes approach.
At which point, or which function calculates the var(y_gi) = sigma^2_g(mu_gi + mu^2_gi phi where phi is NB trended dispersion and sigma^2_g the QL genewise dispersion?
Do you mind if I ask why you're asking this question? You are assuming that explicit computation of the variance function is necessary, but actually that is never required. And of course you don't need to know that sort of thing to use the software.
Dear Gordon, I not only wish to use the software, I also want to understand the models behind it. I have read all the literature behind it and like to understand what exactly is happening. I read about this variance function and was wondering why it was never calculated, but indeed - if I understand the paper from Lund correctly - for the LRT test and the QL-F-test, only the dispersion needs to be estimated.
Neither function explicitly calculates the variance term. The NB dispersion is estimated by maximizing the Cox-Reid adjusted profile likelihood - or more specifically, by maximizing the CR-APL smoothed across abundances to get the trended dispersion. The QL dispersion is estimated from the total residual deviance of the fitted GLM, with empirical Bayes shrinkage to stabilize the estimates. The equation in your post describes how the variance of each count is modelled, but it's not actually used in the code itself.
Technical note: if you really want to know, a working variance "var(y_gi) = mu_gi + mu^2_gi phi" (without the QL dispersion) is computed at each iteration of GLM fitting for each gene. Both estimateDisp and glmQLFit call glmFit, so I guess this calculation could be considered as part of those functions. Note that these values are only used as weights for least-squares fitting, not to estimate the dispersions themselves.
Thanks a lot! And the LRT test and the QL F-test only depend on the QL gene-wise dispersion (the posterior variance, object var.post in edegR) (as proven by McCullagh 1983) ?
Both functions still need the NB dispersion for GLM fitting - the tagwise NB dispersion for the LRT (by default, in glmLRT), and the trended NB dispersion in the QL F-test. The LRT doesn't use the QL dispersion at all. Also, I don't think that McCullagh describes hypothesis testing for quasi-likelihood models in that paper.
Do you mind if I ask why you're asking this question? You are assuming that explicit computation of the variance function is necessary, but actually that is never required. And of course you don't need to know that sort of thing to use the software.
Dear Gordon, I not only wish to use the software, I also want to understand the models behind it. I have read all the literature behind it and like to understand what exactly is happening. I read about this variance function and was wondering why it was never calculated, but indeed - if I understand the paper from Lund correctly - for the LRT test and the QL-F-test, only the dispersion needs to be estimated.