Diffbind discordant "Called" column in the output when setting filter parameter in dba.count
1
0
Entering edit mode
eva.pinatel ▴ 10
@evapinatel-7358
Last seen 3.8 years ago
Italy

Dear dr. Stark and community,


I'm using Diffbind to obtain differential expression values for a series of interesting regions and I have some questions.I run the following:

TF1_initial_IP=dba(sampleSheet="TF1DiffBind_optimal02_IP.csv",config=data.frame(fragmentSize=130), peakCaller="narrow",bCorPlot=FALSE)
TF1count_IP = dba.count ( TF1_initial_IP, minOverlap=2, fragmentSize=130, filter=200, bCorPlot=FALSE)
TF1_IP = dba.contrast( TF1count_IP, categories=c(DBA_TISSUE,DBA_TREATMENT),minMembers=2)
TF1_IP = dba.analyze( TF1_IP,method=DBA_DESEQ2,bReduceObjects=F, bFullLibrarySize=TRUE,bCorPlot=FALSE)
for (i in c(3,4,8,10)){dba.report( TF1_IP,contrast=i,method=c(DBA_DESEQ2), th=1, bCounts=TRUE,bNormalized=TRUE,bCalled=TRUE,DataType=DBA_DATA_FRAME,bCalledDetail=TRUE, file=i,ext="csv",initString="DESEQ2_TF1onIP")}

I noticed that for many peaks (here I attached only one example) the called columns differ from the original list of peaks while, just eliminating the filter parameter, the calls perfectly match to the list of peaks given as input.

        Start    End    Conc    Conc_WT:Femedia    Conc_WT:Dipymedia    Fold    p-value    FDR    Called1 Called2    IP-1C    IP-1D    IP-2C    IP-2D    IP-1C    IP-1D    IP-2C    IP-2D
Filter=default    1420588    1420984    12.69    13.55    10.34    3.21    1.5626087851071E-017    9.35233019146187E-017    2    2    13235.69    10695.00    1537.05    1050.26    +    +    +    +
Filter=200    1420588    1420984    12.69    13.55    10.34    3.21    1.5626087851071E-017    9.35233019146187E-017    0    0    13235.69    10695.00    1537.05    1050.26    -    -    -    -

From what I understand, if none of the samples reaches the minimum of reads required by filter parameter, the interval is eliminated. While the Called column should just indicate how many samples were originally defined as peaks for the examined region. I'm missing something?

Finally I have a doubt about dba.count-dba.analyze functions.

Using default settings, from what I read on this blog, I figured that:
1)Scaled input reads are subtracted to the IP raw counts (and scaling is done only if the input is deeper than the compared ChIP)
2) non integer numbers are rounded
3)negative numbers are set to 1
Counts are then passed to the selected tool (DESEQ2), which calculates normalization factors on the original library size but applies them to the final counts. Is it correct?

I'm  just try to produce .wig files to have some images and check how tracks are modulated by all these operations; there is a way to obtain the normalization factors and the scaling factors used?

Thank you in advance

Eva

diffbind normalization • 1.2k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 7 weeks ago
Cambridge, UK

Hi Eva-

First, can you tell me what version you are working with by sending along the output of sessionInfo()? That will help with the issue with the Called statistics, as this code has changed in recent versions. One workaround you may try is to count and filter in separate steps:

> TF1count_IP = dba.count (TF1_initial_IP, minOverlap=2, fragmentSize=130, bCorPlot=FALSE)
> TF1count_IP = dba.count (TF1count_IP, peaks=NULL, filter=200, bCorPlot=FALSE)

Your explanation regarding the counting algorithm is quite good, except point 3), which should read "non-positive numbers are set to 1) as zero values are also set to 1.

Cheers-

Rory

ADD COMMENT

Login before adding your answer.

Traffic: 993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6