Variance explained (coefficient of determination) in glmFit / glmLRT
1
0
Entering edit mode
Gemma • 0
@2509afca
Last seen 2.6 years ago
Austria

I am trying to run a model where I am predicting gene expression as a function of some fixed factors, using glmFit to fit the model and glmLRT to get the significance of each fixed effect. I would like to get the amount of variation explained by each fixed effect (i.e. coefficient of determination or similar). Is there a way to do that with an object of class "DGEGLM"?

More concretely, I would like to get the variance in expression for each gene (20 of them) explained by AG, PO and MG (the three fixed effects I am including in the model).

Many thanks in advance!

Here is a link to the data: https://drive.google.com/drive/folders/1Sgc3DTmr3SxNycyFFuO5PCrM8lqhnPaU?usp=sharing

And here the code:


c_AxB_A_r1 = read.csv("83Fx332M_fh_R1_83.csv")
c_AxB_B_r1 = read.csv("83Fx332M_fh_R1_332.csv")
c_AxB_A_r2 = read.csv("83Fx332M_fh_R2_83.csv")
c_AxB_B_r2 = read.csv("83Fx332M_fh_R2_332.csv")

c_BxA_A_r1 = read.csv("332Fx83M_fh_R1_83.csv")
c_BxA_B_r1 = read.csv("332Fx83M_fh_R1_332.csv")
c_BxA_A_r2 = read.csv("332Fx83M_fh_R2_83.csv")
c_BxA_B_r2 = read.csv("332Fx83M_fh_R2_332.csv")

genenames=c_AxB_A_r1$gene

y <- DGEList(counts=cbind(c_AxB_A_r1$counts, c_AxB_A_r2$counts, c_AxB_B_r1$counts, c_AxB_B_r2$counts, 
                          c_BxA_A_r1$counts, c_BxA_A_r2$counts, c_BxA_B_r1$counts, c_BxA_B_r2$counts), genes=genenames)

isexpr <- rowSums(cpm(y)>1) >= 4
y <- y[isexpr, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)

AG <- factor(c(0, 0, 1, 1, 0, 0, 1, 1))
PO <- factor(c(0, 0, 1, 1, 1, 1, 0, 0))
MG <- factor(c(0, 0, 0, 0, 1, 1, 1, 1))

design <- model.matrix(~AG+PO+MG)
rownames(design) <- colnames(y)

y <- estimateDisp(y, design, robust=TRUE)
y$common.dispersion

fit <- glmFit(y, design)

model_AG <- glmLRT(fit, coef=2)
results_AG <- model_AG$table
FDR <- p.adjust(results_AG$PValue, method="fdr")
results_AG <- cbind(results_AG, FDR, y$genes$genes)
colnames(results_AG)[6] <- "gene"

model_PO <- glmLRT(fit, coef=3)
results_PO <- model_PO$table
FDR <- p.adjust(results_PO$PValue, method="fdr")
results_PO <- cbind(results_PO, FDR, y$genes$genes)
colnames(results_PO)[6] <- "gene"

model_MG <- glmLRT(fit, coef=4)
results_MG <- model_MG$table
FDR <- p.adjust(results_MG$PValue, method="fdr")
results_MG <- cbind(results_MG, FDR, y$genes$genes)
colnames(results_MG)[6] <- "gene"
edgeR • 987 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

The concept of "coefficient of determination" is only defined for overall regressions, not for individual predictor variables. Assigning how much variability is explained by each individual covariate is not generally possible for any linear model or generalized linear model because the covariates overlap in terms of the variability that they predict. This is not a limitation of edgeR, but a universal property of linear models.

edgeR can return any quantity that would exist for an ordinary univariate glm. For example, one can easily perform a sequential analysis of deviance for each gene, but the proportion of the total deviance attributable to each predictor depends on the order that they are added to the model, i.e., which predictor is adjusted for which.

ADD COMMENT
0
Entering edit mode

Many thanks for your answer, that makes sense.

I imagine one could then compute a partial or semipartial measures of variance contributed by each fixed factor when considering the other predictors (or variance contributed irrespective of the other predictors - which I guess would be similar to just running the models with only one predictor at a time).

I for example found this package (https://cran.r-project.org/web/packages/partR2/vignettes/Using_partR2.html) that can do such analysis on ordinary glms.

Could you please suggest a way of doing this using edgeR?

Many thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6