The calcNormFactors
function doesn't normalize anything. It calculates normalization factors that are intended to do a better job than the raw library size for performing the scale normalization that voom
does by default. In other words, if you use calcNormFactors
first, it will use the TMM method to estimate the effective library size, and then add an updated 'norm.factors' column to the samples data.frame
in your DGEList
object. By default the 'norm.factors' column is just all 1s. As an example:
> y <- matrix(rpois(1000, 5), 200)
> dge <- DGEList(y)
> dge$samples
group lib.size norm.factors
Sample1 1 977 1
Sample2 1 1031 1
Sample3 1 968 1
Sample4 1 965 1
Sample5 1 969 1
> dge <- calcNormFactors(dge)
> dge$samples
group lib.size norm.factors
Sample1 1 977 0.9916904
Sample2 1 1031 1.0017576
Sample3 1 968 0.9842965
Sample4 1 965 1.0405360
Sample5 1 969 0.9828296
And then when you compute the logCPM values those norm.factors are used to adjust the library size.
You should pretty much always use calcNormFactors
because it is designed to account for compositional bias. If that's not actually a problem for your data (like this fake data I just made up) then it won't really change things. But if it is a problem, you will account for the bias.
If you use the normalize.method
in voom
, then it will additionally normalize using normalizeBetweenArrays
. You could hypothetically use an additional normalization method like that, and there are instances where I thought it was a reasonable thing to do, but that's a pretty rare event. For probably the high 90% of analyses you should just use calcNormFactors
and no normalize.method
.
See https://f1000research.com/articles/5-1408, it recommends
calcNormFactors
.