Entering edit mode
Mark Robinson
▴
880
@mark-robinson-4908
Last seen 6.1 years ago
Hi Anand,
Some comments injected below ...
On 28.12.2011, at 10:50, AKSR wrote:
> Hi all,
>
> I have some RNA-Seq data:
> 4 reps per sample, 4 different genotypes & 9 time points
> = 144 data points
>
> I want to essentially know the best method to normalize across
> ALL time points and for each INDIVIDUAL genotype.
> Is the state of the art normalization method today, TMM?
I'm not sure if TMM is "best", but it can certainly improve things.
Basically, the whole idea with TMM is that naively using totals of
mapped reads can bias differential expression, since different
experimental conditions can express different "repertoires".
> If yes, is TMM step-by-step procedure available any where?
> (I do some Perl scripting, but I am pretty new to R)
TMM is available in edgeR's calcNormFactors() function.
> I realize that edgeR might be using TMM for pair-wise
> comparison, but I need to perform normalization across
> time points for each genotype.
> Irrespective of normalization strategy, will I have to choose
> the base level sample aka reference for normalization?
> Or can normalization be done independent of an
> overtly defined reference state?
> - I know this is a naive question, sorry...
> (If required, I would use time point zero as my reference state)
With TMM, you can manually define what reference sample to use, or the
default is to leave it unspecified ? the docs for calcNormFactors()
says:
----
If ?refColumn? is unspecified, the library whose upper quartile is
closest to the mean upper quartile is used.
----
While TMM is pairwise in nature, it may work just fine this way across
your genotypes and time points. I think it's worth trying it and
looking at "smear" plots -- plotSmear() in edgeR -- between some of
your time points (of the same genotype, say), just to see whether the
normalization factors are aligning the M values. There are other
normalization strategies implemented too, that are not explicitly
pairwise -- see ?calcNormFactors. For example, method="RLE", as
proposed by the DESeq authors:
----
?method="RLE"? is the scaling factor method proposed by Anders
and
Huber (2010). We call it "relative log expression", as median
library is calculated from the geometric mean of all columns and
the median ratio of each sample to the median library is taken as
the scale factor.
----
As well, people are actively considering this problem in other
directions (e.g. GC content). For example:
http://www.bioconductor.org/packages/release/bioc/html/cqn.html
http://www.biomedcentral.com/1471-2105/12/480/abstract
Hope that helps,
Mark
> Thanks in advance for guiding me
> AKSR
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor