Should I calculate normalization factors in edgeR using all libraries or using only the compared libraries?
1
0
Entering edit mode
Peter • 0
@peter-7104
Last seen 5.9 years ago
Ireland

I have 3 groups: untreated, negative control (mock treatment), and treated; with 3 replicates in each. I am looking for differential expression between the groups, most importantly the negative control -- treated.

Which is a better approach:
- calculating the normalization factors using all 9 libraries, or
- calculating the normalization factors using only the 2×3 libraries that are compared at a time (and load in three count tables, entirely separately)?

Example for first option:

groups <- factor(c("A", "A", "B", "C", "C", "A", "B", "B", "C"))
dgedata <- DGEList(counts=rnadata, group=groups)
keep <- rowSums(cpm(dgedata) > 1) >= 3
dgedata <- dgedata[keep, keep.lib.sizes=FALSE]
dgedata <- calcNormFactors(dgedata, method=c("TMM"))
dgedata <- estimateCommonDisp(dgedata)
dgedata <- estimateTagwiseDisp(dgedata)
dgedata.results <- exactTest(dgedata, pair=c("A", "B"))


(This is mostly theoretical, as the two approaches differ in only about 20 DE genes (out of hundreds), in each comparison, but I am wondering about the justifications.)

 

edgeR calcNormFactors RNA-seq • 1.4k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 hours ago
The city by the bay

You should use all of the libraries in a dataset when running edgeR, as this provides more residual d.f. for dispersion estimation. This means you should be calculating normalization factors for all 9 libraries at once, rather than separately analyzing a count table for each of the three pairwise comparisons.

In any case, the actual normalization factors should not be very different. calcNormFactors picks a reference library and calculates the near-median M-value (i.e., the systematic difference) of each other library against that reference. If you change the input libraries, the only effect on the calculation would concern the reference library that is chosen. The size of the systematic difference between two libraries should not change much, whether it is calculated directly between libraries or through the reference (i.e., calculate A against reference, then B against the reference, to get A against B).

ADD COMMENT
0
Entering edit mode

OK, thanks for the clarification. The normalization factors are indeed very similar in the two cases.

ADD REPLY

Login before adding your answer.

Traffic: 512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6