Issue carrying out DiffBind Analysis at a predefined peak set
1
0
Entering edit mode
doherta6 • 0
@e3a2d491
Last seen 2.3 years ago
Ireland

Hi, I am trying to look at differential binding of a number of chromatin regulatory proteins via CutandRun only at a predefined set of peaks (Approx. 5000 peaks).

When I carry out the analysis for different proteins at the same predefined peaks I get a different number of consensus peaks which is always less than 5000. I assumed the analysis would be carried out with the entire predefined peak-set as the consensus peaks and the number would stay the same. However, this does not seem to be the case. If you could provide some clarity on what is happening that would be much appreciated.

I am supplying my predefined peaks as a bed file at the sample file generation stage under the peaks column for each sample (3 replicates & 2 conditions)

Thank you in advance,

Anthony

DiffBind • 1.0k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 3 days ago
Cambridge, UK

Assuming you are using default parameter values, there are three aspects of the processing that may alter the number of peaks:

  1. If the peakset you are passing in includes intervals that overlap (by at least one basepair), these will be merged into a single wider peak. This is most likely not what is happening in your case, as you are using the same peakset for each of the comparisons but ending up with a different number of intervals.

  2. In the dba.count() phase, when the peaks are re-centered around the summit, it is possible that peaks that didn’t overlap initially overlap after counting reads and are merged. For example, if the primary point of enrichment for a protein is located at the upstream edge of one peak and the downstream edge of an adjacent peak, they may overlap after extending the window according to the value of the summits parameter. This shouldn’t happen very often and is probably not what is driving the difference in peak numbers you are seeing. You can look for this effect by setting summits=FALSE.

  3. Also in the dba.count() phase, a filter is applied by default to remove peak intervals with very low enrichment across all the samples. This is most likely the culprit in your case, if there are some proteins with enrichment in fewer of the pre-defined peak regions. You can test this by setting filter=0 to eliminate the filtering.

ADD COMMENT
0
Entering edit mode

Hi Rory,

Thank you for the help with this.

It turns out that my predefined peak list had a number of duplicated lines and hence the consensus peak list shrank as a result.

Point 3 was also having a minor effect.

The main reason for different consensus peak numbers turned out to be the result of grey-list filtering.

Many thanks again,

Anthony

ADD REPLY
0
Entering edit mode

I should add grey lists to the possible causes of having different numbers of consensus peaks after counting. This should only happen if you have different control tracks for the different regulatory proteins (I assumed that the control tracks were the same for all the samples).

ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6