I am asking a question relevant to this previous bioconductor-support question: modeling zero-dominated RNA-seq with voom/limma and hurdle models (pscl)
I am wondering: is it better to perform edgeR-GLM on single cell data on the original counts or on normalized data, potentially normalized using scran?
For clarity, below is the pipeline I am currently planning to use. I am wondering if I should perform the glmFit and estimateDisp steps on the counts or the normalized data. It seems to me that I should do it on the counts, because as far as I can tell this is what is typically done for edgeR, but I want to be sure.
disp = estimateDisp(counts, design, robust = TRUE)
fit = glmFit(counts, design = design, dispersion = disp)
contrast_matrix = makeContrasts(MAIN-OTHERS, levels = as.factor(groups)
fit2 = glmLRT(fit, contrast_matrix)
toptable = topTags(fit2, adjust.method = "BH", sort.by = "none", n = nrow(fit2))