Question

DiffBind: Normalization for DESeq2

0

Entering edit mode

JasonLouisStein • 0

@jasonlouisstein-7953

Last seen 9.1 years ago

United States

Hi,

I'm trying to understand the normalization for DESeq2 analysis within DiffBind. If I run:

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=FALSE,bCorPlot=FALSE); Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculating by taking the raw counts for each peak divided by the normalization factor s_j calculated via the median of ratios method described in (http://genomebiology.com/2010/11/10/R106). Is this correct? When I test this, I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to s_j but not exactly the same. (Note this may be because I'm using a blocking factor in my model?).

If I run instead,

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=TRUE,bCorPlot=FALSE);  
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculated by taking the raw counts for each peak divided by librarysize/min(librarysize). Is this correct? I can test this as well, and again I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to librarysize/min(librarysize) but not exactly the same.

So, by setting bFullLibrarySize=TRUE (the default), then I am only using the library size as a normalization factor and no other normalization factor? As I understand it, this can be biased by very highly "expressed" peaks, which is why the DESeq2 authors proposed the median normalization method. Whereas if I set bFullLibrarySize=FALSE, I use the median of ratios method as my normalization factor and not the library size?

That was a lot of questions, but thanks for helping me figure this out, and also thanks for making such a useful and well-supported package!

Jason

diffbind deseq2 • 2.8k views

ADD COMMENT • link updated 9.5 years ago by Rory Stark ★ 5.2k • written 9.5 years ago by JasonLouisStein • 0

score 0 · Answer 1 · 2015-05-29

Hi Jason-

Section 7.5 of the DiffBind vignette explains how DESeq2 is used.

Specifically, if bFullLibrarySize=FALSE, it calls DESeq2::estimateSizeFactors() to calculate the normalization factors. If bFullLibrarySize=TRUE, it the factors are set to:

> DESeq2::sizeFactors(DESeqDataSeq) <- libsize/min(libsize)

Where libsize is a vector containing the number of reads in each bam file.

The normalized counts returned by dba.report()are the raw reads divided by the normalization factors, obtained by calling DESeq2::sizeFactors().

Hope this helps-

Rory