I would like to compute the variance explained (i.e. coefficient of determination, R2) by a model in edgeR. Concretely, I am modelling various gene expression phenotypes using glmFit, and determining the significants of a few predictors using glmLRT.
Could you please indicate how to compute R2 from the output of these models?
Many thanks in advance!
Many thanks, that seems to work!
However, the estimates tend to be surprisingly high, for many genes R2 being close to 1 (quartiles being 0%: 1.879e-13; 25%: 0.0152; 50%: 0.1009; 75%: 0.5773; 100%: 0.9997). Is this distribution as expected?
I would also like to ask a follow-up question regarding this other post: Variance explained (coefficient of determination) in glmFit / glmLRT
I am actually fitting a model with three predictors and trying to compute the proportion of deviance explained by each of the predictors. I understand that this relies on making a choice on the order in which the predictors are included in the model (which others we are correcting for). A conservative and consistent way of computing
R2_pred1
might be by subtracting R2 computed considering onlypred2
andpred3
from that of the full model (considering all three predictors). Something like this:Would this be correct?
Many thanks again!
The R2 values look completely normal. The median R2 is 10%, which seems somewhat low rather than high. With so many genes, you will naturally get some R2 over the whole range from 0 to 1, just by chance variation, which is what you see.
Even if none of the genes are differentially expressed and the data was just random, you would still expect to get R2 values around 3 / (nsamples - 1) on average.
Regarding the predictor specific R2, I don't know what you're trying to do. Your
R2_nox1
is the proportion of the deviance that x1 contributes over and above nox2 and nox3, but I don't know why you are computingR2_x1
etc. There is no right or wrong here. You're just computing descriptive statistics.Many thanks for your answer.
To clarify the last point,
R2_nox1
is the proportion of the deviance explained when consideringdesign.nox1 = model.matrix(~x2+x3)
, i.e. withoutx1
. So to compute the deviance contributed byx1
I am subtractingR2_nox1
from the deviance explained by when considering all the predictors (R_full
):R2_x1 = R2_full - R2_nox1
.The idea is to compute the deviance contributed only by
x1
, by removing the contributions ofx2
andx3
from the total explained variance. Is this one possible way of computing the deviance contributed by one variable when accounting for that contributed by the rest? Otherwise, could you please propose an alternative?Thanks again!
No, you have it wrong way around.
If you want deviance explained by x1, then you need
You do not need to subtract one R2 from another.
Both strategies actually lead to the same results. Resolved, thanks!