using dba.plotVenn I realized that the sum of number of peaks that I got in each circle of the Venn diagram is not equal to the number of peaks that I had in my input file. When I went back to the tutorial of the DiffBind package ( http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf ), I found the same behaviour: in fact, in figure 14 the total number of peaks for MCF72 (43+47+57+885 = 1032) is different from the number of MCF7 2nd replicate listed in the output of the command dba(sampleSheet="tamoxifen.csv") on page 4, which is 1037 (so 5 peaks are missing in the Venn diagram).
Is there a specific reason for this behaviour of dba.plotVenn (or of the dba.overlap function it is based on) or is it a bug? If not, how can I tune it in order to display a total number of peaks which would correspond to that of the input file?
Thank you for your help!
The numbers don't add up because of overlapping peaks.
Consider a case where you have two peaksets, A and B. A consists of two small peaks, while B contains one peak. The two peaks in A both overlap the one peak in B. So A has two overlapping peaks, and B has one overlapping peak, and they all refer to one interval that contains all of them. There is no single value you could put in the middle of the Venn diagram -- it would have to be "2 from A and 1 from B".
To deal with this, DiffBind merges overlapping peaks. So in the above example, it would replace the two peaks in A and the peak in B with a single (likely wider) peak that encompasses all of them. As a result, the total number of peaks for a sample (the ones unique to that sample plus the ones that overlap with other samples) may be less than (never greater than) the original number if multiple peaks in that peakset have been merged into overlapping peaks.
The merging function is described in the document you reference above in Section 7.2.
Right, I haven't thought about it... Thanks!