Normally, DiffBind is used for ChIP-seq data, but I've seen it used for ATAC-seq as well. I don't see any issues with that. MACS can be used for the initial peak calling. However, when performing peak calling for ATAC-seq data, it is advised to adjust the default parameters so peaks are centered on the cutting sites. A common suggestion is `--nomodel --shift -100 --extsize 200` or `--nomodel --shift 37 --extsize 73`.
If the reads are adjusted for peak calling, what happens when I then pass the peaks and the BAMs to DiffBind? Wouldn't DiffBind quantify the peaks incorrectly since the BAMs and peaks do not really represent the same data?
Centering the peak on the cuttting site is good. I haven't looked specifically at ATAC data, but for what it's worth...
You can use the fragmentSize parameter of dba.count to ensure (mostly) that the reads overlap the cutting site. If the actual fragment length is on average, say, 300bp, then setting fragmentSize=300 will cause DiffBind to extend the reads (strand-specifically) to that length, so that each read *should* cover the true site.
Also, don't make the specified peak too narrow. Center it by all means, but if the peak is too narrow, then in a strong site, say, many reads may be a bit too far up- or downstream of the peak to be counted.
I can't be too clear on this answer but as I understand it MACS2 was designed for chip-seq and as such will always "shift" the reads in the 5' prime direction due to some detail in the production of the chip-seq data . We use the option --shift 100 to move ATAC-seq reads in the opposite direction so that when MACS2 moves the reads.....they are returned to their original true position. So this should not be an issue for DiffBind. However, ATAC seq data is also adjusted for TN5 insertion of 9bp adapters....in this case the question posted is valid.... "If the reads are adjusted for peak calling, what happens when I then pass the peaks and the BAMs to DiffBind? Wouldn't DiffBind quantify the peaks incorrectly since the BAMs and peaks do not really represent the same data?"
My understanding is that the "shift" in MACS2 is related to the modelled (or expected) fragment size (namely, half the mean fragment length). As Gord says above, extending the reads to the full fragment length (instead of using the read length) will cause single-end short reads to "overlap" features (peaks) not directly sequenced. Generally speaking, fragments in a sequencing library are longer than the reads, and also longer than regions of enrichment (peaks). Most reads that start 9bp from the actual site will overlap that site. So the overlap counts, while not perfect, should be a reasonable approximation of the "truth."
If there is concern about being more precise, the ATAC peaks can all be simply shifted 9bp (easy to do in R) before being processed by DiffBind. But keep in mind the goal of a differential analysis, which is relative and not absolute. We are attempting to define regions that likely have changed their enrichment levels systematically, which can be accomplished without having perfectly quantified every peak for every sample.
My understanding is that the "shift" in MACS2 is related to the modelled (or expected) fragment size (namely, half the mean fragment length). As Gord says above, extending the reads to the full fragment length (instead of using the read length) will cause single-end short reads to "overlap" features (peaks) not directly sequenced. Generally speaking, fragments in a sequencing library are longer than the reads, and also longer than regions of enrichment (peaks). Most reads that start 9bp from the actual site will overlap that site. So the overlap counts, while not perfect, should be a reasonable approximation of the "truth."
If there is concern about being more precise, the ATAC peaks can all be simply shifted 9bp (easy to do in R) before being processed by DiffBind. But keep in mind the goal of a differential analysis, which is relative and not absolute. We are attempting to define regions that likely have changed their enrichment levels systematically, which can be accomplished without having perfectly quantified every peak for every sample.