In a previous question, I mentioned that I was using csaw to do read counting for histone modification ChIP-Seq. I'm trying to determine the right combination of the read extension and width parameters. The User's Guide uses width=150 and no ext option for histone data (e.g. in Section 4.3.1). However, this seem like it would be too permissive to measure the reads corresponding to a single nucleosome. For example, if the read length was 100, then this allows a 250 range of starting positions to be counted for each window. The approach that makes the most sense to me is the following:
window.counts <- windowCounts(sample.table$bampath, ext=147, width=1, spacing=73, param=param)
Specifically, this means I'm using windows with only a 1 bp width spaced every half-nucleosome length (73 bp) across the genome, and I'm extending each read out to the length of a nucleosome. This effectively means that I'm positing each 1-bp window as the exact center of a nucleosome's footprint, and then any reads that overlap this window (after extension are reads that could potentially belong to a nucleosome centered at that position. And since the space between windows is only half a nucleosome length, it should be impossible to "skip over" a nucleosome by accident. In fact, if my math is correct, each read should overlap exactly 2 windows. On the other hand, many nucleosomes will probably be covered by 2 windows, which believe is not a major concern since differential binding significance will be aggregated between neighboring windows anyway.
So, I have chosen this scheme because it seems to me that it will generate counts that are as representative as possible of the number of reads derived from each nucleosome. However, I realize that this criterion does not necessarily guarantee the optimal performance of DB test, so I figured I would ask about the rationale for the parameter choices for histone data in the csaw User's Guide, and what advantages it would have over my scheme.