Difference between EdgeR and DeSeq in library normalization

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 10.6 years ago

Dear All, I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each. I was exploring the performance of both EdgeR and DeSeq and I noticed they vary a lot on the dispersion of the normalization factors. Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2 while if I use DeSeq estimateSizeFactors the distribution varies from 0.4-1.7. Given that these are exactly the same libraries why do the estimates vary so much? How will that impact the list of DEgenes? I know that the calculations are not performed in the same way, but aren't those two functions aimed at estimating the same phenomenon? thanks for your help. -- Lucia Peixoto PhD Postdoctoral Research Fellow Laboratory of Dr. Ted Abel Department of Biology School of Arts and Sciences University of Pennsylvania "Think boldly, don't be afraid of making mistakes, don't miss small details, keep your eyes open, and be modest in everything except your aims." Albert Szent-Gyorgyi [[alternative HTML version deleted]]

RNASeq Normalization edgeR DESeq RNASeq Normalization edgeR DESeq • 1.7k views

ADD COMMENT • link updated 12.1 years ago by Simon Anders ★ 3.8k • written 12.1 years ago by Lucia Peixoto ▴ 330

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 4.7 years ago

Zentrum für Molekularbiologie, Universi…

Hi Lucia On 15/03/13 16:43, Lucia Peixoto wrote: > I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each. > I was exploring the performance of both EdgeR and DeSeq and I noticed they > vary a lot on the dispersion of the normalization factors. > Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2 > while if I use DeSeq estimateSizeFactors the distribution varies from > 0.4-1.7. Given that these are exactly the same libraries > why do the estimates vary so much? How will that impact the list of DEgenes? > I know that the calculations are not performed in the same way, but aren't > those two functions aimed at estimating the same phenomenon? EdgeR's library factors are relative to the total read count, and DESeq's aren't. Do, if you want to compare them, you have to multiply the factors from edgeR with the total read counts and divide by some suitable big number. So, if sf is vector of size factors from DESeq, nf is a vector of normalization factors from edgeR, and rs is the vector with the column sums of the count matrix, I would expect that plot( sf, rs * nm ) gives a plot with the points lying roughly on a straight line. Simon

ADD COMMENT • link 12.1 years ago Simon Anders ★ 3.8k

0

Entering edit mode

If you use the "getOffset" function for your DGEList object and the following function for your CountDataSet object, you will get offset values that are directly comparable: library(DESeq) library(edgeR) library(ggplot2) getOffset.CountDataSet <- function(y) { if (anyis.na(sizeFactors(y)))) stop("Call estimateSizeFactors first") log(sizeFactors(y)) - mean(log(sizeFactors(y))) + mean(log(colSums(counts(y)))) } cds <- makeExampleCountDataSet() cds <- estimateSizeFactors(cds) dge <- DGEList(counts=counts(cds), group=pData(cds)$condition) dge <- calcNormFactors(dge) qplot(x=getOffset(dge), y=getOffset.CountDataSet(cds)) + labs(title="Offsets, DESeq vs edgeR", x="edgeR offset", y="DESeq offset") + coord_equal() + geom_abline(slope=1, intercept=0) On Fri 15 Mar 2013 11:58:11 AM PDT, Simon Anders wrote: > Hi Lucia > > On 15/03/13 16:43, Lucia Peixoto wrote: >> I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 >> each. >> I was exploring the performance of both EdgeR and DeSeq and I noticed >> they >> vary a lot on the dispersion of the normalization factors. >> Using EdgeR calcNormFactors I get a distribution that varies from >> 0.9-1.2 >> while if I use DeSeq estimateSizeFactors the distribution varies from >> 0.4-1.7. Given that these are exactly the same libraries >> why do the estimates vary so much? How will that impact the list of >> DEgenes? >> I know that the calculations are not performed in the same way, but >> aren't >> those two functions aimed at estimating the same phenomenon? > > EdgeR's library factors are relative to the total read count, and > DESeq's aren't. Do, if you want to compare them, you have to multiply > the factors from edgeR with the total read counts and divide by some > suitable big number. > > So, if sf is vector of size factors from DESeq, nf is a vector of > normalization factors from edgeR, and rs is the vector with the column > sums of the count matrix, I would expect that > > plot( sf, rs * nm ) > > gives a plot with the points lying roughly on a straight line. > > Simon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.1 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.