Question

Normalising divergent samples - Discussion (Please!)

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 10.6 years ago

Hi, I've asked about this before and got hardly any response, so I'll try again. I guess this is universal but I'm using affymetrix arabidopsis ATH1 data in my examples. Most (all?) normalisations assume that the different samples are similar, with the majority of probesets not changing. So the question is what to do when lots of them are changing due to different tissues, timepoints or treatments. I guess it's obvious but I've noticed that if you normalise divergent tissues together with (GC)RMA then you get a lower correlation between replicates than if you normalise the 2 tissues separately. Recently a large dataset has become available which in parallel offers a chance to see how different tissues compare using different normalisations and also (IMO) needs an appropriate normalisation to make it more useful to the wider community. The data set consists of c.200 chips, in triplicate (biological - but taken from the same batch of plants so not much variance) of many different plant tissues. The idea of this dataset is that for any given transcript you can see its expression pattern across all tissues, I guess similar resources may be developed for other organisms at some point (already?), and this area may be of growing interest in the future? Firstly does anyone know of people working on this problem that I could get in contact with. Alternatively if anyone is interested in working on it and wants more details or to collaborate in some way then drop me a mail. So, if not (or even if), then maybe someone can help me out with some comments or discussion on the following points. What is known about the hybridisation behaviour of samples with less transcripts present? Are there any studies on this (not sure how you would do this though). Has anyone tried to use the B2 oligo intensities in any way, is it possible to access them, and is its use consistent enough to be used in any useful way to control for hybridisation efficiency? Has anyone normalised (even within MAS5) to a small number of control genes on affy arrays, if so how were they selected and how did it perform. How much of the differences in the intensity distributions on different arrays is technical, interfering biological(RNA quality and quantity) versus meaningful differences in expression levels. Considering this, would distorting (modifying?) the distribution using quantile normalisation be worse than a simple scaling normalisation?(speculation is welcome as I guess this cannot actually be answered) What (off-the-shelf) normalistion would you recommend/think to be best? Finally, I guess it needs to be asked whether this is really a data analysis problem or a case of expecting data magic, and with so many unknown factors (at present) will this kind of study ever produce really useful data? Cheers, Matt

oligo oligo • 823 views

ADD COMMENT • link 20.8 years ago Matthew Hannah ▴ 940

score 0 · Answer 1 · 2004-07-21

Again this has met a blank, I can't believe that nobody has any comment on this. If you are working on this, or want to, please let me know. I'm a biologist and have no serious hope of developing a new method, especially not alone but more in knowing that something may be possible or in the pipeline as I have some potentially useful applications. Even if you are not working on this I'd be interested in gathering opinions on the following even if it's just speculation. What is known about the hybridisation behaviour of samples with less transcripts present? Are there any studies on this (not sure how you would do this though). Specifically - how would one sample containing 5000 mRNA's compare to one with 10000 mRNA's on a 20k chip. Would you expect the overall intensity be changed or shifted? What would happen with the background? How would having less transcripts affect different normalisations? Has anyone tried to use the B2 oligo intensities in any way, is it possible to access them, and is its use (in terms of the way it is spiked in) consistent enough to be used in any useful way to control for hybridisation efficiency? Or does anyone know anything more about them than the mention in the Affy manuals. Has anyone normalised (even within MAS5) to a small number of control genes on affy arrays, if so how were they selected and how did it perform. How much of the differences in the intensity distributions on different arrays is technical, interfering biological(RNA quality and quantity eg: see first question) versus meaningful differences in expression levels. Considering this, would distorting (modifying?) the distribution using quantile normalisation be worse than a simple scaling normalisation?(speculation is welcome as I guess this cannot actually be answered) What (off-the-shelf) normalistion would you recommend/think to be best? Finally, I guess it needs to be asked whether this is really a data analysis problem or a case of expecting data magic, and with so many unknown factors (at present) will this kind of study ever produce really useful data? Thanks (in hope). Matt