Question

Should edgeR-GLM on single cell RNA data be performed on the counts or normalized data?

0

Entering edit mode

amckenz • 0

@amckenz-11264

Last seen 4.1 years ago

I am asking a question relevant to this previous bioconductor-support question: modeling zero-dominated RNA-seq with voom/limma and hurdle models (pscl)

I am wondering: is it better to perform edgeR-GLM on single cell data on the original counts or on normalized data, potentially normalized using scran?

For clarity, below is the pipeline I am currently planning to use. I am wondering if I should perform the glmFit and estimateDisp steps on the counts or the normalized data. It seems to me that I should do it on the counts, because as far as I can tell this is what is typically done for edgeR, but I want to be sure.

disp = estimateDisp(counts, design, robust = TRUE)
fit = glmFit(counts, design = design, dispersion = disp)
contrast_matrix = makeContrasts(MAIN-OTHERS, levels = as.factor(groups)

fit2 = glmLRT(fit, contrast_matrix)

toptable = topTags(fit2, adjust.method = "BH", sort.by = "none", n = nrow(fit2))

edger scran • 1.6k views

ADD COMMENT • link updated 8.5 years ago by davis ▴ 90 • written 8.5 years ago by amckenz • 0

score 1 · Answer 1 · 2016-08-09

You should use the counts with edgeR.

scran computes size-factors for normalization comparable to those from TMM, but with some smart adjustments to appropriately compute size factors from scRNA-seq data with lots of dropouts. As such, size factors from scran are used in an edgeR workflow in the same way as TMM size factors or similar would be.

If you are using scran with an SCESet object for the normalization, then checkout the "convertTo" function to produce a DGEList object ready for analysis with edgeR.

score 0 · Answer 2 · 2016-08-09

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 24 months ago

United States

You should take a look at scran.

ADD COMMENT • link 8.5 years ago Steve Lianoglou ★ 13k