There are two things going on here. One is an error for which I am checking in a fix. The other is the way it is supposed to work.
The error is that the peak data included with the package does not exactly match the data used in the prebuilt objects (eg, tamoxifen_peaks
). So you are seeing 1513 peaks for this sample, while if you load the prebuilt data you would see 1556 peaks, which is what was used to generate Figure 13.
You can try this yourself:
> data(tamoxifen_peaks)
> tamoxifen
11 Samples, 2845 sites in matrix (3795 total):
ID Tissue Factor Condition Treatment Replicate Caller Intervals
1 BT4741 BT474 ER Resistant Full-Media 1 bed 1080
2 BT4742 BT474 ER Resistant Full-Media 2 bed 1122
3 MCF71 MCF7 ER Responsive Full-Media 1 bed 1556
4 MCF72 MCF7 ER Responsive Full-Media 2 bed 1046
5 MCF73 MCF7 ER Responsive Full-Media 3 bed 1339
6 T47D1 T47D ER Responsive Full-Media 1 bed 527
7 T47D2 T47D ER Responsive Full-Media 2 bed 373
8 MCF7r1 MCF7 ER Resistant Full-Media 1 bed 1438
9 MCF7r2 MCF7 ER Resistant Full-Media 2 bed 930
10 ZR751 ZR75 ER Responsive Full-Media 1 bed 2346
11 ZR752 ZR75 ER Responsive Full-Media 2 bed 2345
Here you see that sample MCF71 has 1556 peaks. But as you point out, the Venn diagram in Figure 13 adds up to 1517 peaks, which is fewer.
The reason there are fewer peaks in the overlaps is because some peak merging has taken place. Suppose there are two nearby peaks in MCF71 that overlap with a single peak in MCF72:
MCF71 |-------| |-------|
MCF72 |-------------|
How many peaks are in the overlap between the two samples? Is it 1 peak, or two peaks, or three peaks? The way DiffBind
handles this is to merge overlapping peaks into the widest area that encompasses all the overlapping peaks:
MCF71 |-------| |-------|
MCF72 |-------------|
MERGED |-----------------------|
So it will count this as 1 overlapping region. For this reason, the number of merged peaks is always less than or equal to the number of original peaks.
Cheers-
Rory