I'm performing quantile normalization with CQN and then using edgeR on some ATAC-seq samples I have and I'm trying to understand/determine the following:
When setting the values for library size, Should I use the sum of read counts that fall within the peaks from the total peak (performed for each sample) or should I use the total aligned reads per sample.
I have tended to use the total aligned reads per sample for edgeR's lib.size when performing differential binding analyses, because it is easier to interpret as sequencing depth. Any global increases or decreases in binding (or in this case, accessibility) between conditions would alter the proportion of reads in peaks, conflating technical differences in sequencing depth with actual biological differences in chromatin structure.
For the actual differential analysis, though, it barely matters. The CQN offsets will override any library size specification - and more generally, if you computed TMM normalization factors, they would also compensate for any differences in the library size specification. A different set of library sizes will alter the calculation of the average log-CPMs and predicted log-fold changes, but this should be a very modest effect.