Question

Diffbind discordant "Called" column in the output when setting filter parameter in dba.count

0

Entering edit mode

eva.pinatel ▴ 10

@evapinatel-7358

Last seen 4.2 years ago

Italy

Dear dr. Stark and community,

I'm using Diffbind to obtain differential expression values for a series of interesting regions and I have some questions.I run the following:

TF1_initial_IP=dba(sampleSheet="TF1DiffBind_optimal02_IP.csv",config=data.frame(fragmentSize=130), peakCaller="narrow",bCorPlot=FALSE)
TF1count_IP = dba.count ( TF1_initial_IP, minOverlap=2, fragmentSize=130, filter=200, bCorPlot=FALSE)
TF1_IP = dba.contrast( TF1count_IP, categories=c(DBA_TISSUE,DBA_TREATMENT),minMembers=2)
TF1_IP = dba.analyze( TF1_IP,method=DBA_DESEQ2,bReduceObjects=F, bFullLibrarySize=TRUE,bCorPlot=FALSE)
for (i in c(3,4,8,10)){dba.report( TF1_IP,contrast=i,method=c(DBA_DESEQ2), th=1, bCounts=TRUE,bNormalized=TRUE,bCalled=TRUE,DataType=DBA_DATA_FRAME,bCalledDetail=TRUE, file=i,ext="csv",initString="DESEQ2_TF1onIP")}

I noticed that for many peaks (here I attached only one example) the called columns differ from the original list of peaks while, just eliminating the filter parameter, the calls perfectly match to the list of peaks given as input.

       Start   End   Conc   Conc_WT:Femedia   Conc_WT:Dipymedia   Fold   p-value   FDR   Called1 Called2   IP-1C   IP-1D   IP-2C   IP-2D   IP-1C   IP-1D   IP-2C   IP-2D
Filter=default   1420588   1420984   12.69   13.55   10.34   3.21   1.5626087851071E-017   9.35233019146187E-017   2   2   13235.69   10695.00   1537.05   1050.26   +   +   +   +
Filter=200   1420588   1420984   12.69   13.55   10.34   3.21   1.5626087851071E-017   9.35233019146187E-017   0   0   13235.69   10695.00   1537.05   1050.26   -   -   -   -

From what I understand, if none of the samples reaches the minimum of reads required by filter parameter, the interval is eliminated. While the Called column should just indicate how many samples were originally defined as peaks for the examined region. I'm missing something?

Finally I have a doubt about dba.count-dba.analyze functions.

Using default settings, from what I read on this blog, I figured that:
1)Scaled input reads are subtracted to the IP raw counts (and scaling is done only if the input is deeper than the compared ChIP)
2) non integer numbers are rounded
3)negative numbers are set to 1
Counts are then passed to the selected tool (DESEQ2), which calculates normalization factors on the original library size but applies them to the final counts. Is it correct?

I'm just try to produce .wig files to have some images and check how tracks are modulated by all these operations; there is a way to obtain the normalization factors and the scaling factors used?

Thank you in advance

Eva

diffbind normalization • 1.2k views

ADD COMMENT • link updated 9.8 years ago by Rory Stark ★ 5.2k • written 9.8 years ago by eva.pinatel ▴ 10

score 0 · Answer 1 · 2015-06-18

Hi Eva-

First, can you tell me what version you are working with by sending along the output of sessionInfo()? That will help with the issue with the Called statistics, as this code has changed in recent versions. One workaround you may try is to count and filter in separate steps:

> TF1count_IP = dba.count (TF1_initial_IP, minOverlap=2, fragmentSize=130, bCorPlot=FALSE)
> TF1count_IP = dba.count (TF1count_IP, peaks=NULL, filter=200, bCorPlot=FALSE)

Your explanation regarding the counting algorithm is quite good, except point 3), which should read "non-positive numbers are set to 1) as zero values are also set to 1.

Cheers-

Rory