Low FDR for significant peaks in DiffBind output
2
0
Entering edit mode
Silvia • 0
@silvia-7163
Last seen 3.4 years ago
United Kingdom

Hi,

I used DiffBind for a collaborator who noticed that, contrary to his expectations, a particular region (which overlaps highly significant (pvalue << 0.001) peaks from MACS2 in each replicate of condition Y) got a very low FDR when compared to condition X (in which there are no peaks from any of the 4 replicates). Could you please help me to figure out if there's anything wrong in my code? Unfortunately, I cannot share the real data (it's still unpublished), but here's the anonymised experimental setup and the commands I used to compare conditions Y vs X, as well as Z vs X:

> samples
    SampleID  Factor Condition Replicate bamReads bamControl Peaks       PeakCaller
1  condX_repA TF     condX     1         A-TF.bam A-I.bam    A-condX.bed bed
2  condX_repB TF     condX     2         B-TF.bam B-I.bam    B-condX.bed bed
3  condX_repC TF     condX     3         C-TF.bam C-I.bam    C-condX.bed bed
4  condX_repD TF     condX     4         D-TF.bam D-I.bam    D-condX.bed bed
5  condY_repE TF     condY     1         E-TF.bam E-I.bam    E-condY.bed bed
6  condY_repF TF     condY     2         F-TF.bam F-I.bam    F-condY.bed bed
7  condY_repG TF     condY     3         G-TF.bam G-I.bam    G-condY.bed bed
8  condY_repH TF     condY     4         H-TF.bam H-I.bam    H-condY.bed bed
9  condZ_repJ TF     condZ     1         J-TF.bam J-I.bam    J-condZ.bed bed
10 condZ_repK TF     condZ     2         K-TF.bam K-I.bam    J-condZ.bed bed
11 condZ_repL TF     condZ     3         L-TF.bam L-I.bam    L-condZ.bed bed
12 condZ_repM TF     condZ     4         M-TF.bam M-I.bam    M-condZ.bed bed

> db_analysis = dba(sampleSheet="sampleInfo.csv", minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.count(db_analysis, minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.contrast(db_analysis, minMembers=4, categories=DBA_CONDITION)
> db_analysis = dba.analyze(db_analysis, bCorPlot=FALSE)
> dba.report(db_analysis, contrast=1, th=1, file="sites_Y_vs_X", ext="txt")
> dba.report(db_analysis, contrast=2, th=1, file="sites_Z_vs_X", ext="txt")

Thanks a lot for your help!

Silvia

diffbind • 1.6k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 28 days ago
Cambridge, UK

Hi Silvia-

Sorry for the delay in responding -- I'm away on holiday and don't have access to a computer. 

In he meantime one thing you can try is to set bCounts=TRUE in the call to dba.report() and check the read counts in the regions of interest. You may want to compare normalized and raw counts using the bNormalized  parameter. This way you can see if there really is a consistent change in the read counts.

Another possibility is that there is something going on with the control tracks for these samples. If there is a pileup in the control track at these regions, this will be subtracted from the ChIP reads and may wipe out the signal.  You can check this using DBA_SCORE_READS in the call to dba.count(), and turn off the subtraction when calling dba.analyze().

Hope this helps-

Rory

ADD COMMENT
0
Entering edit mode
Silvia • 0
@silvia-7163
Last seen 3.4 years ago
United Kingdom

Hi Rory,

Sorry for the delay of my reply, I've been unwell.

Thanks a lot for your suggestions! There was indeed something weird with the read counts, but after I updated both DiffBind (1.16.3 -> 2.2.12) and systemPipeR (1.4.8 -> 1.8.1) the counts finally matched what I saw from the bam files in IGV, and the FDR for that particular region changed dramatically.

There might have been some bug in one of the two packages, but I learnt the hard way I should always keep them up-to-date :)

All the best,

Silvia

ADD COMMENT

Login before adding your answer.

Traffic: 720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6