Question

Using Input Samples in ChIP-seq Differential Analysis (csaw/edgeR)

1

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 day ago

Germany

The setup: ChIP-seq for histone marks in two closely-related cell lines which represent certain developmental stages.

The problem: We know from cytogenetics that one of the cell lines has genetic anomalies such as triplicated chromosome1 (trisomy) that will/would (probably) increase the counts originating from peaks at that chromosome by roughly 33%. Input samples (so total chromatin input) is available.

The question: What would be the current best practices using the csaw/edgeR framework to correct for systematic differences in input abundances such as this one caused by trisomy-1? The most straight-forward approach would be to test interactions such as (ChIP1 - Input 1) - (ChIP2 - Input2) but as the input library composition is strikingly different from the ChIP samples the underlying assumptions will probably be violated. So far I would normalize the data using the 10kb-bin strategy suggested in csaw to account for the compositional changes. In this thread (https://support.bioconductor.org/p/82099/) it was suggested to ignore problematic regions but as here we are talking about an entire chromosome this is not an option.

Your opinions are appreciated.

csaw chip-seq • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Aaron Lun ★ 28k • written 5.4 years ago by ATpoint ★ 4.8k

score 1 · Answer 1 · 2019-12-18

The simplest strategy is just to analyze the affected chromosome separately. As in, literally subset your SummarizedExperiment object so that it only contains windows in the affected chromosome, and perform the entire analysis on that SE object separate from the windows for the other chromosomes. (Up until you get p-values for each window, then you can stitch the data frames back together before you run combineTests.) This allows the normalization machinery to pick up on the systematic difference in coverage in the bins, thus accounting for the difference in chromosome copy number. Even for just one chromosome, you should still have loads of windows with reasonable abundances, so there's still plenty of features to use for empirical Bayes shrinkage.

A more complex version of the above approach involves constructing an offset matrix. This basically involves normalizing the affected chromosome separately but then integrating the chromosome-specific size factors back into an analysis with all windows. I guess this is the more technically correct way of doing it, but it's a bit of a hassle, and it's not a particularly common use case that I've bothered to write special functions for it.

The most complicated approach would be to test an interaction. I'm not thrilled about this because the low input counts will probably cripple your detection power. Also, inputs are not particularly ChIP-like... which is the whole point if you're using them to peaks, but in this case, you're comparing the ChIP-input difference in condition to that in another condition. This assumes that the non-ChIP-ness of the input in one condition is the same as that in another condition, which seems like a stretch due to changes in chromatin state, etc.