DiffBind: Normalization for DESeq2
1
0
Entering edit mode
@jasonlouisstein-7953
Last seen 9.1 years ago
United States

Hi,

I'm trying to understand the normalization for DESeq2 analysis within DiffBind.  If I run:

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=FALSE,bCorPlot=FALSE); 
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculating by taking the raw counts for each peak divided by the normalization factor s_j calculated via the median of ratios method described in (http://genomebiology.com/2010/11/10/R106).  Is this correct?  When I test this, I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to s_j but not exactly the same. (Note this may be because I'm using a blocking factor in my model?).

If I run instead, 

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=TRUE,bCorPlot=FALSE);  
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculated by taking the raw counts for each peak divided by librarysize/min(librarysize).  Is this correct?  I can test this as well, and again I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to librarysize/min(librarysize) but not exactly the same.

So, by setting bFullLibrarySize=TRUE (the default), then I am only using the library size as a normalization factor and no other normalization factor?  As I understand it, this can be biased by very highly "expressed" peaks, which is why the DESeq2 authors proposed the median normalization method.  Whereas if I set bFullLibrarySize=FALSE, I use the median of ratios method as my normalization factor and not the library size?

That was a lot of questions, but thanks for helping me figure this out, and also thanks for making such a useful and well-supported package!

Jason

diffbind deseq2 • 2.8k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 27 days ago
Cambridge, UK

Hi Jason-

Section 7.5 of the DiffBind vignette explains how DESeq2 is used.

Specifically, if bFullLibrarySize=FALSE, it calls DESeq2::estimateSizeFactors() to calculate the normalization factors. If bFullLibrarySize=TRUE, it the factors are set to:

> DESeq2::sizeFactors(DESeqDataSeq) <- libsize/min(libsize)

Where libsize is a vector containing the number of reads in each bam file.

The normalized counts returned by dba.report()are the raw reads divided by the normalization factors, obtained by calling DESeq2::sizeFactors().

Hope this helps-

Rory

ADD COMMENT

Login before adding your answer.

Traffic: 639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6