Entering edit mode
Matthew Hannah
▴
940
@matthew-hannah-621
Last seen 10.5 years ago
Hi,
I've asked about this before and got hardly any response, so I'll try
again.
I guess this is universal but I'm using affymetrix arabidopsis ATH1
data in
my examples.
Most (all?) normalisations assume that the different samples are
similar, with
the majority of probesets not changing. So the question is what to do
when lots
of them are changing due to different tissues, timepoints or
treatments.
I guess it's obvious but I've noticed that if you normalise divergent
tissues
together with (GC)RMA then you get a lower correlation between
replicates than
if you normalise the 2 tissues separately.
Recently a large dataset has become available which in parallel offers
a chance
to see how different tissues compare using different normalisations
and also
(IMO) needs an appropriate normalisation to make it more useful to the
wider
community. The data set consists of c.200 chips, in triplicate
(biological - but
taken from the same batch of plants so not much variance) of many
different
plant tissues. The idea of this dataset is that for any given
transcript you can
see its expression pattern across all tissues, I guess similar
resources may
be developed for other organisms at some point (already?), and this
area may be
of growing interest in the future?
Firstly does anyone know of people working on this problem that I
could get in
contact with. Alternatively if anyone is interested in working on it
and wants
more details or to collaborate in some way then drop me a mail.
So, if not (or even if), then maybe someone can help me out with some
comments
or discussion on the following points.
What is known about the hybridisation behaviour of samples with less
transcripts
present? Are there any studies on this (not sure how you would do this
though).
Has anyone tried to use the B2 oligo intensities in any way, is it
possible to
access them, and is its use consistent enough to be used in any useful
way to
control for hybridisation efficiency?
Has anyone normalised (even within MAS5) to a small number of control
genes on
affy arrays, if so how were they selected and how did it perform.
How much of the differences in the intensity distributions on
different arrays
is technical, interfering biological(RNA quality and quantity) versus
meaningful
differences in expression levels. Considering this, would distorting
(modifying?)
the distribution using quantile normalisation be worse than a simple
scaling
normalisation?(speculation is welcome as I guess this cannot actually
be answered)
What (off-the-shelf) normalistion would you recommend/think to be
best?
Finally, I guess it needs to be asked whether this is really a data
analysis
problem or a case of expecting data magic, and with so many unknown
factors (at
present) will this kind of study ever produce really useful data?
Cheers,
Matt