Question

CUT&Tag differential peak analysis using Diffbind, with pre-calculated normalization factor.

0

Entering edit mode

mwong043 • 0

@mwong043-23926

Last seen 4.0 years ago

Woodbridge

Hi. I would like to perform differential peak analysis for my CUT&Tag data. I converted the bam files of the samples to bedgraph files. I then normalized the signal of the bedbraph files using a normalization factor calculated based on the E. coli bacterial read count. Since my Tn5 enzyme was extracted in E coli, the bacterial read count can therefore serve as an internal spike in control. I then use SEACR, which is the recommended peak caller for CUT&Tag experiments to call peaks for my samples.

For the next step I would like to perform differential peak analysis using Diffbind. But my question is how to normalize the samples using pre-calculated normalizing factor? In the Reference Manual it mentioned that users can supply normalization factors by using the DBA_NORM_USER command, how exactly this can be done?

For example, I have 6 samples in 2 groups, with the following normalization factor : Group 1: 0.85, 0.96, 1.2,

Group 2: 1.1, 0.9, 0.8

Thanks. Matthew

DiffBind • 3.3k views

ADD COMMENT • link updated 12 weeks ago by mec25 • 0 • written 4.0 years ago by mwong043 • 0

score 1 · Answer 1 · 2021-04-06

1

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 3 months ago

Cambridge, UK

You can supply your own normalization factors by calling the the dba.normalize() function after running dba.count(). The normalize parameter should be a vector of the same length as the number of samples. Larger values should correspond to samples with greater numbers of E. coli reads.

Alternatively, in addition to the SEACR peaks and CUT&Tag bam files, you can supply the E. coli reads to DiffBind and have it re-compute the normalization factors. These can be specified as a separate E. coli-aligned bam files using the Spikein column of the sample sheet, then set spikein=TRUE (instead of using the normalize parameter). If your E. coli reads are included in the same bam files as your CUT&Tag reads, instead of using specifying Spikein files, set spikein to a vector of the E. coli chromosome names.

ADD COMMENT • link 4.0 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks Rory. I tried the first method by defining a vector containing the normalisation factors for my samples and it worked.

I am just wondering how to manually define the normalisation factors for the IgG control samples as well?

Thanks for your help.

Matthew

ADD REPLY • link 4.0 years ago mwong043 • 0

0

Entering edit mode

If the IgG reads are included as control tracks, they are not normalized. In this case they would usually be used to generate greylists (and/or have their reads subtracted from the primary CUT&Tag samples). In the subtraction case, read counts to be subtracted are scaled based on the relative library sizes of the primary and controls samples.

Generally IgG controls are not explicitly included as primary samples in the model; if they are, the would probably use the same normalization method as the CUT&Tag samples.

(I saw a message go by regarding a problem providing a vector of normalization values, but that message seems to have disappeared, so let me know if you are still having an issue with that).

ADD REPLY • link 3.7 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I am unable to find the math behind the dba.normalize(), for instance when the E-coli fraction is used is the signal mutliplied by the factor input? I would like to create similarly normalized bigwig files using deepTools but cannot find in the diffBind manual what math is being applied, could anyone point me to this? Thanks!

ADD REPLY • link 12 weeks ago mec25 • 0