Say I'd like to assess 'differential binding' of a histon mark (K9me2) in two conditions (mutant vs wildtype), So I try to answer the question how much binding of the histon mark changes when changing from wlidtype to mutant.
I have 2 replicates for each condition, each with a matching input (8 libraries).
The simplest thing to do, as I understand, would be to compute enrichment over input for all the paired chip and input samples and then compare those fold changes over the two conditions.
In the vignette in section 3.5.3 it says that "These controls are mostly irrelevant when testing for DB between ChIP samples". But this is only true if there is only one input reference for all samples as is the case with the example in the vignette.
In the paper csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows it says:
The ... the GLM framework means that csaw can incorporate condition specific controls into a regular DB analysis in serveral ways...
One approach is to include the the controls in the linear model so that the log fold change between conditions for the ChIP samples is compared to that of the controls.
Would that mean passing a contrast of the form (ChIP.mutant - input.mutant) - (ChIP.wt - input.wt) to the linear model (with a 2 factor design with chIP/input and mutant/wt) and then testing for DB?
Another approach is to normalize the ChIP samples to condition specific controls and pass the adjustments to csaw as offsets for GLM fitting.
Does that mean normalizing all the samples (ChIPs and inputs) together (e.g. with a call to normOffsets) and then using these norm.factors in the downstream analysis? Does that mean we can now compare the ChIP libraries (in mutant and wildtype) unconditionally of their respective backgrounds because we have accounted for the background in the normalization?
I am very confused about how to incorporate input controls beyond just comparing simple enrichments.