[EDGER] Normalization issue
1
0
Entering edit mode
@francois-richard-5410
Last seen 10.3 years ago
Dear all, I am a master student in France, working on RNA-seq data. I am trying to go through a differential gene expression analysis using EdgeR and starting with 2 conditions * 2 replicates = 4 runs (illumina, mapped with bowtie on known reference genome). I have few questions about the normalization of the dataset. As I understood, the normalization is needed to correct the library size between each samples. It is given by the TMM method, calling the calcNormFactors() function. This give a normalization factor that will correspond to an offset in the model that will test for differential expressed genes. The function estimateCommonDisp() give the dispersion and the exactTest() run the differential analysis (performing negative binomial test). But according to the edgeR manual, those two functions called the equalizeLibSizes() function in order to generate pseudo counts (which corrected the library size as well). What I do not understand here is that the library size should be already corrected by the TMM method. My question is, finally : What is the difference between the calcNormFactors() and equalizeLibSizes()? Does the pseudo-counts generated by equalizeLibSizes() are taking care of the normalization factor? I hope I have been clear enough, and that you will be able to help me, Thanks a lot, Fran?ois
Normalization GO edgeR Normalization GO edgeR • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Francois, On 7/20/2012 9:57 AM, Fran?ois RICHARD wrote: > Dear all, > > I am a master student in France, working on RNA-seq data. > I am trying to go through a differential gene expression analysis > using EdgeR and starting with 2 conditions * 2 replicates = 4 runs > (illumina, mapped with bowtie on known reference genome). I have few > questions about the normalization of the dataset. > > As I understood, the normalization is needed to correct the library > size between each samples. It is given by the TMM method, calling the > calcNormFactors() function. No, the calcNormFactors() function is used to account for 'RNA composition', not library size. See section 2.3.3 in the edgeR User's guide. > This give a normalization factor that will correspond to an offset in > the model that will test for differential expressed genes. > > The function estimateCommonDisp() give the dispersion and the > exactTest() run the differential analysis (performing negative > binomial test). But according to the edgeR manual, those two functions > called the equalizeLibSizes() function in order to generate pseudo > counts (which corrected the library size as well). Right. The library size is automatically corrected. You _may_ need to use calcNormFactors() to account for situations where technical effects can bias your results. Two examples are given in 2.3.3 of the edgeR user's guide. Best, Jim > > What I do not understand here is that the library size should be > already corrected by the TMM method. > > My question is, finally : > What is the difference between the calcNormFactors() and > equalizeLibSizes()? Does the pseudo-counts generated by > equalizeLibSizes() are taking care of the normalization factor? > > I hope I have been clear enough, and that you will be able to help me, > > Thanks a lot, > > Fran?ois > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6