Question

EdgeR normalization factors - do they take into account library size?

0

Entering edit mode

Lucy ▴ 60

@lucy-17014

Last seen 5 months ago

United Kingdom

Hi,

I am unsure as to whether the EdgeR normalization factors (calculated with calcNormFactors) take into account library size. From my reading, they only deal with RNA composition and library size is dealt with elsewhere but I unclear on this. Could someone please clarify the steps of normalization in EdgeR?

Many thanks,

Lucy

edgeR • 3.7k views

ADD COMMENT • link updated 6.5 years ago by James W. MacDonald 68k • written 6.5 years ago by Lucy ▴ 60

score 2 · Answer 1 · 2018-08-22

Short answer: see Section 2.7.3 of the edgeR user's guide.

Long answer: The normalization factors account for composition biases, separate from differences in library size between samples. This is useful in situations where you have samples that are sequenced at different depth, and you want to examine their composition biases separately from the differences in coverage (e.g., to compare across conditions). Of course, both factors need to be considered in the final normalization, which is why they get multiplied together to form the effective library size in all downstream analyses.

score 1 · Answer 2 · 2018-08-22

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

They are intended to adjust for compositional bias, but the default method is TMM (trimmed mean of M-values), where the M value is the log ratio between samples, which by definition includes the library size in the calculation. So in that sense, yes it takes into account the library size. But in the end the offset used in the model is the library size scaled by the normalization factor (contingent upon there not being an existing offset matrix in your DGEList). So if you are asking 'does calcNormFactors directly affect my library size?', then no, not until the modeling step. For example, using example(calcNormFactors)

> z <- DGEList(y)
> z
An object of class "DGEList"
$counts
  Sample1 Sample2 Sample3 Sample4 Sample5
1       5       5       3       2       5
2       7       5       5      10       4
3       3       9       4       2       5
4       6       7       8       3       3
5       6       4       2       3       6
195 more rows ...

$samples
        group lib.size norm.factors
Sample1     1      986            1
Sample2     1     1036            1
Sample3     1     1048            1
Sample4     1      962            1
Sample5     1      996            1

> calcNormFactors(z)
An object of class "DGEList"
$counts
  Sample1 Sample2 Sample3 Sample4 Sample5
1       5       5       3       2       5
2       7       5       5      10       4
3       3       9       4       2       5
4       6       7       8       3       3
5       6       4       2       3       6
195 more rows ...

$samples
        group lib.size norm.factors
Sample1     1      986    1.0078564
Sample2     1     1036    1.0114433
Sample3     1     1048    0.9782103
Sample4     1      962    0.9799588
Sample5     1      996    1.0233395

You can see that the computed norm.factors change, but not the library size.

ADD COMMENT • link 6.5 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you Aaron and James. I am actually trying to decide what to normalize by when I generate BigWig files using deepTools - should I use the edgeR normalization factor or the effective library size (normalization factor x library size) - the latter seems to make most sense?

ADD REPLY • link 6.5 years ago Lucy ▴ 60

0

Entering edit mode

Yes, it makes more sense to scale (i.e., divide) your BigWig coverage by the effective library size. I assume you are dealing with RNA-seq data? If you are dealing with other forms of sequencing data related to genomic coverage, there may be other biases involved. These require more care when you compute normalization factors - for example, see the csaw user's guide for some details about computing these factors for ChIP-seq data.

ADD REPLY • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Great thank you. Yes, I am dealing with RNA-seq data, although I also have ATAC-seq data. What would you recommend for scaling this?

ADD REPLY • link 6.5 years ago Lucy ▴ 60

1

Entering edit mode

I haven't personally dealt with ATAC-seq data, but a few of my colleagues have used csaw (or at least its normalization) for it, and they seemed fairly satisfied, so...

ADD REPLY • link 6.5 years ago Aaron Lun ★ 28k