Hi List,
Quantile-quantile normalization assumes common distribution for data
sets to
be normalized. I am fine with replicate normalization using this.
However,
for different experiments, such as data from different tissues, is the
assumption still valid?
Could anybody point me to some reference that conducts comparison
under many
different experimental conditions? (for example, under >10 different
tissues
or cell line experiments). I read all the papers/ documents I can
find. But
still not convinced we can use that assumption.
Thanks
Regards
-h
[[alternate HTML version deleted]]
On Sat, 22 Mar 2003, Wang, Hui wrote:
> Hi List,
>
>
>
> Quantile-quantile normalization assumes common distribution for data
sets to
> be normalized. I am fine with replicate normalization using this.
However,
> for different experiments, such as data from different tissues, is
the
> assumption still valid?
probably not. but when replicate arrays have completely
different distributions, in my opinion one is left with with no choice
but
to make such assumaptions. are you willing to make the assumption they
all
have the same median? how about the same quartiles? where to draw the
line
is not easy.
>
>
>
> Could anybody point me to some reference that conducts comparison
under many
> different experimental conditions? (for example, under >10 different
tissues
> or cell line experiments). I read all the papers/ documents I can
find. But
> still not convinced we can use that assumption.
>
>
both RMA papers (Biostatistics and NAR) apply the method to the
diltion
data set that has liver and central nervous system cell lines.
>
> Thanks
>
>
>
> Regards
>
> -h
>
>
> [[alternate HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
Dear Rafael,
Thanks for your reply.
> > Quantile-quantile normalization assumes common distribution for
data
sets to
> > be normalized. I am fine with replicate normalization using this.
However,
> > for different experiments, such as data from different tissues, is
the
> > assumption still valid?
>
> probably not. but when replicate arrays have completely
> different distributions, in my opinion one is left with with no
choice but
> to make such assumaptions. are you willing to make the assumption
they all
> have the same median? how about the same quartiles? where to draw
the line
> is not easy.
I agree with you. Here is my thought, for replicates (for example,
three
chips from one sample preparation), It is probably a valid assumption
(of
course you have to get rid of problematic chip first, for example chip
that
have scratches,) even one replicate is 2 times brighter than the
other. The
sample variation has less effect here.
However for samples from different tissues, it is hard to believe this
is
true. It is very possible that the samples belong to the same type of
distribution, however with different mean and variance(I look many
QQplot
from different experiments). Of course genes with obvious expression
changes (biological relevant) usually are minority for a huge data
set. It
is probably still OK to have that assumption. I just try to find out
whether
there are rigorous comparisons (vs an assumption).
> > Could anybody point me to some reference that conducts comparison
under
many
> > different experimental conditions? (for example, under >10
different
tissues
> > or cell line experiments). I read all the papers/ documents I can
find.
But
> > still not convinced we can use that assumption.
>
> both RMA papers (Biostatistics and NAR) apply the method to the
diltion
> data set that has liver and central nervous system cell lines.
I read these papers. They are very good papers and well-written. For
the
dilution data with the same background, it can help to understand
replicate
normalization and something understanding of sample variation. However
to
understand issues across different samples (totally different
background)
two cell lines may not be enough (of course, this two cell lines seems
carefully chosen). I am thinking something like known amount of spike-
ins
before/after sample preparation in many different tissue/cell line
background would give a better understanding. It will address
variations
caused by chips, sample preps as well as different sample background
complexity. The last one is probably more biological relevant.
Does this make sense at all?
What in your opinion is the best normalization method so far?
Regards
-h
[[alternate HTML version deleted]]
Dear Hui,
I have few comments too (inserted in your previous posts).
On Sat, Mar 22, 2003 at 04:04:21PM -0800, Wang, Hui wrote:
> Dear Rafael,
>
> Thanks for your reply.
>
> > > Quantile-quantile normalization assumes common distribution for
data
> sets to
> > > be normalized. I am fine with replicate normalization using
this.
> However,
> > > for different experiments, such as data from different tissues,
is the
> > > assumption still valid?
> >
> > probably not. but when replicate arrays have completely
> > different distributions, in my opinion one is left with with no
choice but
> > to make such assumaptions. are you willing to make the assumption
they all
> > have the same median? how about the same quartiles? where to draw
the line
> > is not easy.
>
> I agree with you. Here is my thought, for replicates (for example,
three
> chips from one sample preparation), It is probably a valid
assumption (of
> course you have to get rid of problematic chip first, for example
chip that
> have scratches,) even one replicate is 2 times brighter than the
other. The
> sample variation has less effect here.
>
> However for samples from different tissues, it is hard to believe
this is
> true. It is very possible that the samples belong to the same type
of
> distribution, however with different mean and variance(I look many
QQplot
> from different experiments). Of course genes with obvious
expression
> changes (biological relevant) usually are minority for a huge data
set. It
> is probably still OK to have that assumption. I just try to find out
whether
> there are rigorous comparisons (vs an assumption).
>
I am completely on your side about the underlying assumptions for what
I would call 'distribution driven transformation methods'. While using
that, one clearly assumes that on the biological side of the story
only
very few genes are differentially expressed across the different
experiments.
If one has any reason to suspect that it not the case(*), those
normalization
method are to be used with care. The method 'invariantset' could make
you
feel more confident for such cases. However, it does not necessarily
mean
that these normalisation methods are not acceptable for such cases. I
did
run one of them(**) on data from different tissues, and I had a good
surprise when looking at a matrix of scatter plots for the probe level
intensities. The difference of tissues could be observed visually.
But, naturally a more in-depth study of these normalization methods
for
these cases would be needed. Doing a spike-in of thousands of genes
is obviously not the thing to do, but I remember seeing a draft of
paper
on a web site that used a very clever idea: using the mRNA from two
different
tissues, a third condition was created by mixing RNA from the two
tissues.
The first name on the draft was William J Lemon (whose email cannot be
found
in my messy ${HOME} at the moment), he may have other suggestions
too...
(*): like comparing cells from different tissues as you mentioned, or
may be
studies of dividing/resting cells, or di-auxic shift, or reaction to
heat shock, or healthy/infected cells...
(**): can't remember which one it was now.. quantiles, qspline, else ?
Hopin' it helps,
Laurent
>
> > > Could anybody point me to some reference that conducts
comparison under
> many
> > > different experimental conditions? (for example, under >10
different
> tissues
> > > or cell line experiments). I read all the papers/ documents I
can find.
> But
> > > still not convinced we can use that assumption.
> >
> > both RMA papers (Biostatistics and NAR) apply the method to the
diltion
> > data set that has liver and central nervous system cell lines.
>
> I read these papers. They are very good papers and well-written. For
the
> dilution data with the same background, it can help to understand
replicate
> normalization and something understanding of sample variation.
However to
> understand issues across different samples (totally different
background)
> two cell lines may not be enough (of course, this two cell lines
seems
> carefully chosen). I am thinking something like known amount of
spike-ins
> before/after sample preparation in many different tissue/cell line
> background would give a better understanding. It will address
variations
> caused by chips, sample preps as well as different sample background
> complexity. The last one is probably more biological relevant.
>
> Does this make sense at all?
>
> What in your opinion is the best normalization method so far?
>
> Regards
>
> -h
>
>
> [[alternate HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
--
--------------------------------------------------------------
currently at the National Yang-Ming University in Taipei, Taiwan
--------------------------------------------------------------
Laurent Gautier CBS, Building 208, DTU
PhD. Student DK-2800 Lyngby,Denmark
tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent