Hi ,
I have RNA-seq count data for 30,400 genes across 6 conditions (3
replicates
per condition). I was trying different normalization methods and then
test
for differentially expressed genes between conditions. How to test
whether
estimateSizeFactors() and estimateVarianceFunctions() does a good fit
for my
data? Also is there a way to test whether the normalization is good ?
Any
help is greatly appreciated,
Thanks,
Shrey
[[alternative HTML version deleted]]
Hi Shrey
> I have RNA-seq count data for 30,400 genes across 6 conditions (3
replicates
> per condition). I was trying different normalization methods and
then test
> for differentially expressed genes between conditions. How to test
whether
> estimateSizeFactors() and estimateVarianceFunctions() does a good
fit for my
> data? Also is there a way to test whether the normalization is good
? Any
> help is greatly appreciated,
To test whether the normalization (i.e., the size factor estimation)
worked fine, do an MA plot for a pair of samples and mark the size
factor log ratio with a horizontal line.
Here is a demonstration with example data:
library( DESeq )
# Make some example data (or use your real data )
cds <- makeExampleCountDataSet( )
# estimate the size factors
cds <- estimateSizeFactors( cds )
# Choose two samples for which you want to check whether they are
# properly normalizae with respect to each other
s1 <- 1; s2 <- 2
# Make the MA plot, i.e., plot the log fold change between the
sample
# against the mean of the log counts
plot(
( log10( counts(cds)[,s1] ) + log10( counts(cds)[,s2] ) )/2,
log10( counts(cds)[,s2] ) - log10( counts(cds)[,s1] ) )
# In this plot, the bulk of the genes which are not differentially
# expressed should scatter around a horizontal line in the middle.
# The position of this line should be given by the log ratio of
# the size factors. Mark the latter:
abline(
h=log10( sizeFactors(cds)[s2] ) - log10( sizeFactors(cds)[s1] ),
col="red" )
# Now, the red line should go right through the middle of the bulk
of
# not differentially expressed genes.
I hope that helps
Simon
Hi Shrey
regarding your other question
On 12/06/2010 10:54 PM, Shreyartha Mukherjee wrote:
> I have RNA-seq count data for 30,400 genes across 6 conditions (3
replicates
> per condition). I was trying different normalization methods and
then test
> for differentially expressed genes between conditions. How to test
whether
> estimateSizeFactors() and estimateVarianceFunctions() does a good
fit for my
> data? Also is there a way to test whether the normalization is good
? Any
> help is greatly appreciated,
To check the effect of estimateVarianceFunctions: have a look at the
package vignette, specifically at Fig. 2. There, the variance
estimates
for each gene are plotted against the mean, and the estimated variance
function is indicated by a red line. This should show a reasonable
fit.
Simon