I am trying to run a model where I am predicting gene expression as a function of some fixed factors, using glmFit to fit the model and glmLRT to get the significance of each fixed effect. I would like to get the amount of variation explained by each fixed effect (i.e. coefficient of determination or similar). Is there a way to do that with an object of class "DGEGLM"?
More concretely, I would like to get the variance in expression for each gene (20 of them) explained by AG, PO and MG (the three fixed effects I am including in the model).
Many thanks in advance!
Here is a link to the data: https://drive.google.com/drive/folders/1Sgc3DTmr3SxNycyFFuO5PCrM8lqhnPaU?usp=sharing
And here the code:
c_AxB_A_r1 = read.csv("83Fx332M_fh_R1_83.csv")
c_AxB_B_r1 = read.csv("83Fx332M_fh_R1_332.csv")
c_AxB_A_r2 = read.csv("83Fx332M_fh_R2_83.csv")
c_AxB_B_r2 = read.csv("83Fx332M_fh_R2_332.csv")
c_BxA_A_r1 = read.csv("332Fx83M_fh_R1_83.csv")
c_BxA_B_r1 = read.csv("332Fx83M_fh_R1_332.csv")
c_BxA_A_r2 = read.csv("332Fx83M_fh_R2_83.csv")
c_BxA_B_r2 = read.csv("332Fx83M_fh_R2_332.csv")
genenames=c_AxB_A_r1$gene
y <- DGEList(counts=cbind(c_AxB_A_r1$counts, c_AxB_A_r2$counts, c_AxB_B_r1$counts, c_AxB_B_r2$counts,
c_BxA_A_r1$counts, c_BxA_A_r2$counts, c_BxA_B_r1$counts, c_BxA_B_r2$counts), genes=genenames)
isexpr <- rowSums(cpm(y)>1) >= 4
y <- y[isexpr, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
AG <- factor(c(0, 0, 1, 1, 0, 0, 1, 1))
PO <- factor(c(0, 0, 1, 1, 1, 1, 0, 0))
MG <- factor(c(0, 0, 0, 0, 1, 1, 1, 1))
design <- model.matrix(~AG+PO+MG)
rownames(design) <- colnames(y)
y <- estimateDisp(y, design, robust=TRUE)
y$common.dispersion
fit <- glmFit(y, design)
model_AG <- glmLRT(fit, coef=2)
results_AG <- model_AG$table
FDR <- p.adjust(results_AG$PValue, method="fdr")
results_AG <- cbind(results_AG, FDR, y$genes$genes)
colnames(results_AG)[6] <- "gene"
model_PO <- glmLRT(fit, coef=3)
results_PO <- model_PO$table
FDR <- p.adjust(results_PO$PValue, method="fdr")
results_PO <- cbind(results_PO, FDR, y$genes$genes)
colnames(results_PO)[6] <- "gene"
model_MG <- glmLRT(fit, coef=4)
results_MG <- model_MG$table
FDR <- p.adjust(results_MG$PValue, method="fdr")
results_MG <- cbind(results_MG, FDR, y$genes$genes)
colnames(results_MG)[6] <- "gene"
Many thanks for your answer, that makes sense.
I imagine one could then compute a partial or semipartial measures of variance contributed by each fixed factor when considering the other predictors (or variance contributed irrespective of the other predictors - which I guess would be similar to just running the models with only one predictor at a time).
I for example found this package (https://cran.r-project.org/web/packages/partR2/vignettes/Using_partR2.html) that can do such analysis on ordinary glms.
Could you please suggest a way of doing this using edgeR?
Many thanks again!