Question

edgeR calcNormFactors or normalization factors with voom?

1

Entering edit mode

XTR5 ▴ 10

@p1000

Last seen 3.2 years ago

United States

It is considered best practice to use the calcNormFactors function in edgeR or rely on voom for normalization across samples?

Using calcNormFactors:

voom(calcNormFactors(DGEList(counts)), design)

I understand that any normalization factors found in counts will still be used even if normalize.method="none"

Alternatively, just using voom:

voom(counts, design)

I've tried both approaches on my datasets and they give very, very similar results, but wondering if one approach is considered best practice here.

edgeR limma • 5.7k views

ADD COMMENT • link 3.4 years ago XTR5 ▴ 10

1

Entering edit mode

See https://f1000research.com/articles/5-1408, it recommends calcNormFactors.

ADD REPLY • link 3.4 years ago ATpoint ★ 4.5k

score 6 · Accepted Answer · 2021-06-21

The calcNormFactors function doesn't normalize anything. It calculates normalization factors that are intended to do a better job than the raw library size for performing the scale normalization that voom does by default. In other words, if you use calcNormFactors first, it will use the TMM method to estimate the effective library size, and then add an updated 'norm.factors' column to the samples data.frame in your DGEList object. By default the 'norm.factors' column is just all 1s. As an example:

> y <- matrix(rpois(1000, 5), 200)
> dge <- DGEList(y)
> dge$samples
        group lib.size norm.factors
Sample1     1      977            1
Sample2     1     1031            1
Sample3     1      968            1
Sample4     1      965            1
Sample5     1      969            1
> dge <- calcNormFactors(dge)
> dge$samples
        group lib.size norm.factors
Sample1     1      977    0.9916904
Sample2     1     1031    1.0017576
Sample3     1      968    0.9842965
Sample4     1      965    1.0405360
Sample5     1      969    0.9828296

And then when you compute the logCPM values those norm.factors are used to adjust the library size.

You should pretty much always use calcNormFactors because it is designed to account for compositional bias. If that's not actually a problem for your data (like this fake data I just made up) then it won't really change things. But if it is a problem, you will account for the bias.

If you use the normalize.method in voom, then it will additionally normalize using normalizeBetweenArrays. You could hypothetically use an additional normalization method like that, and there are instances where I thought it was a reasonable thing to do, but that's a pretty rare event. For probably the high 90% of analyses you should just use calcNormFactors and no normalize.method.