Question

CQN and EdgeR Library Size for Normalization

0

Entering edit mode

shankasal • 0

@shankasal-15611

Last seen 6.8 years ago

I'm performing quantile normalization with CQN and then using edgeR on some ATAC-seq samples I have and I'm trying to understand/determine the following:

When setting the values for library size, Should I use the sum of read counts that fall within the peaks from the total peak (performed for each sample) or should I use the total aligned reads per sample.

Thanks

CQN atac-seq normalization edgeR • 1.6k views

ADD COMMENT • link 7.0 years ago shankasal • 0

0

Entering edit mode

shankasal • 0

@shankasal-15611

Last seen 6.8 years ago

Thanks Aaron, that's a satisfying answer. I had been using the total aligned reads and will continue as such.

ADD COMMENT • link 7.0 years ago shankasal • 0

score 2 · Accepted Answer · 2018-05-12

I have tended to use the total aligned reads per sample for edgeR's lib.size when performing differential binding analyses, because it is easier to interpret as sequencing depth. Any global increases or decreases in binding (or in this case, accessibility) between conditions would alter the proportion of reads in peaks, conflating technical differences in sequencing depth with actual biological differences in chromatin structure.

For the actual differential analysis, though, it barely matters. The CQN offsets will override any library size specification - and more generally, if you computed TMM normalization factors, they would also compensate for any differences in the library size specification. A different set of library sizes will alter the calculation of the average log-CPMs and predicted log-fold changes, but this should be a very modest effect.