I was wondering of there was any package implementing a peak calling method that would take paired-end data (for ATACseq precisely).
I am checked the packages narrowPeaks, BayesPeaks and SPP (which is not in bioconductor), but none seems to work with paired-end data.
I am currently using macs2, which works fine, but as I do all further step in R, it would have come handy to have an R software do to this step as well.
Thanks Gordon! In the case of ATAC we do not have a background/input to compare to (like DNAse), but I could use the method of filtering by local enrichment you propose in the csaw user guide, to mimic peak calling. Thanks!
I am not aware of any ChIP-seq peak-callers that specifically use the paired-end information, rather than just pretending the reads are single-end. Historically, paired-end ChIP-seq is pretty rare compared to single-end, which is probably why the problem has not been addressed fully. Of course, ATAC-seq has big differences from ChIP-seq. Perhaps a paired-end peak-caller will be of much greater value in this setting; somebody more familiar with ATAC-seq may be able to offer you an opinion.
As far as ChIP-seq analysis goes, two big advantages of paired-end data are 1) more specific removal of PCR duplicates and 2) better estimation of average fragment length, both of which can often be done as pre-processing steps without a paired-end specific peak-caller. Then, the data can be fed in to your favourite peak-caller as single-end reads. How appropriate that is for ATAC-seq analysis is something I don't feel qualified to say with any certainty.
I might have misunderstood something, but I think that macs2 in mode BAMPE do take the fact that data are paired-end into account (extending reads to the fragment length in the bam file, not an average fragment length).
I believe that they use paired-end reads to predict average fragment length, as per point 2) in my second paragraph. However, as far as I'm aware, the pairings do not affect the actual model they use to call peaks.
from https://groups.google.com/forum/#!searchin/macs-announcement/BAMPE/macs-announcement/Vh916L3tOOE/TUjLeQtsCfkJ :
"In BAMPE mode, the 5’ end plus the observed template length will both be recorded so in later analysis, MACS2 piles up the actual entire observed fragment/template instead of estimating a fixed DNA fragment length"
So it looks like they do not comptue an average length, but use the indicated fragment length for each fragment :)
I think for ATAC-seq you might consider peak callers that have worked good with DNase. Have a look to this paper
Koohy H, Down TA, Spivakov M, Hubbard T (2014) A Comparison of Peak Callers Used for DNase-Seq Data. PLoS ONE 9(5): e96303.
We also reviewed many peak callers here (ChIP-seq only)
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, et al. (2013) Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data. PLoS Comput Biol 9(11): e1003326.
In the original paper describing ATACseq, they used ZINBA. In fact macs2 works very well on our data, but I was wodnering if something in R existed that would handle paired end :)
Thanks Gordon! In the case of ATAC we do not have a background/input to compare to (like DNAse), but I could use the method of filtering by local enrichment you propose in the csaw user guide, to mimic peak calling. Thanks!