VSN, RMA, dCHIP, etc....

0

Entering edit mode

Stefan Thomsen ▴ 50

@stefan-thomsen-2425

Last seen 10.6 years ago

Dear all, currently evaluating the performance of different normalization strategies to an Affymetrix data set, I have some semi-technical, semi- philosophical questions. Given (i) the jungle of possible normalization strategies implemented in R and other platforms, (ii) the fact that most authors describe which normalization strategy they used but not why they chose this and no other, (iii) the sparse literature on how to find the strategy most suitable for a given design/experiment/data set, I would be very grateful for any comments on the following questions: 1) Are there written or silently accepted guidelines to evaluate, choose, and justify the choice of normalization strategies? 2) What could be sensible "readouts" for the performance of a given normalization strategy ? (Personally, I am looking at the performance on spike-in-control and a handful of known gene profiles. I am very intersted in complementary approaches) 3) Is there some literature on this issue that may have escaped my notice? Any comments on this issue would be highly appreciated. Kind regards, Stefan -- Dr. Stefan Thomsen Research Associate Department of Zoology University of Cambridge Downing Street Cambridge CB2 3EJ Tel.: +44 1223 336623 Fax: +44 1223 336679 stt26 at cam.ac.uk

Normalization Normalization • 1.2k views

ADD COMMENT • link updated 17.5 years ago by Wolfgang Huber ★ 13k • written 17.5 years ago by Stefan Thomsen ▴ 50

0

Entering edit mode

Tobias Straub ▴ 430

@tobias-straub-2182

Last seen 10.6 years ago

trying to add my semi-philosophical, semi-biological cents: - i agree with your concerns, it is a jungle. and it's very difficult to decide where to go. - if you are in doubt about your strategy you might want to apply it to a golden standard data set with maximum prior knowledge (if there is any - depends on the application). - your results should be fairly similar when applying different data analysis strategies (this basically means that if you have 'good' input, the output is usually not severely compromised by different data processing strategies). if you get different results with different strategies then maybe your primary data is not good enough, you do not have enough data points, you do not have enough replicates.. etc etc. - if your results are plausible you might be on the right track! try to confirm your results with different experiments/technologies. - i think that in general one can assume that less data manipulation (normalization etc.) is rather not harmful. and vice versa. - as far as i am concerned, normalization is usually not the problem, but whatever comes thereafter. things like filtering, significance testing. best regards T. On Oct 16, 2007, at 2:35 PM, Stefan Thomsen wrote: > Dear all, > > currently evaluating the performance of different normalization > strategies > to an Affymetrix data set, I have some semi-technical, semi- > philosophical > questions. > > Given (i) the jungle of possible normalization strategies > implemented in R > and other platforms, (ii) the fact that most authors describe which > normalization strategy they used but not why they chose this and no > other, > (iii) the sparse literature on how to find the strategy most > suitable for a > given design/experiment/data set, I would be very grateful for any > comments > on the following questions: > > 1) Are there written or silently accepted guidelines to evaluate, > choose, > and justify the choice of normalization strategies? > > 2) What could be sensible "readouts" for the performance of a given > normalization strategy ? (Personally, I am looking at the > performance on > spike-in-control and a handful of known gene profiles. I am very > intersted > in complementary approaches) > > 3) Is there some literature on this issue that may have escaped my > notice? > > > Any comments on this issue would be highly appreciated. > > Kind regards, > > Stefan > > -- > Dr. Stefan Thomsen > Research Associate > > Department of Zoology > University of Cambridge > Downing Street > Cambridge CB2 3EJ > > Tel.: +44 1223 336623 > Fax: +44 1223 336679 > > stt26 at cam.ac.uk > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany

ADD COMMENT • link 17.5 years ago Tobias Straub ▴ 430

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

Dear Stefan, have you seen Rafa's work on Affycomp? See http://affycomp.biostat.jhsph.edu and the two Bioinformatics papers cited there. There are some maps for the djungle. In principle I second Tobias' point that the choices shouldn't make a big difference on the bottom line result if you have "good" data (and that could almost be seen as a definition of data quality). However, there is actually a reason for the variety of methods, which is that the following questions are actually hard to answer (and the best answers may be application specific): - whether and how you use the MM values - whether and how you do probe sequence specific background correction - whether and how you weight probe signal in a sequence specific way - where you want to be on the variance - bias tradeoff Finally, a bigger issue than some of the gory variations in preprocessing methods may be the mapping of probes to target genes. The one you get from the manufacturer (and by extension, through our default CDF packages) is often not the best, and cross- or off-target hybridisation can be a problem. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > currently evaluating the performance of different normalization strategies > to an Affymetrix data set, I have some semi-technical, semi- philosophical > questions. > > Given (i) the jungle of possible normalization strategies implemented in R > and other platforms, (ii) the fact that most authors describe which > normalization strategy they used but not why they chose this and no other, > (iii) the sparse literature on how to find the strategy most suitable for a > given design/experiment/data set, I would be very grateful for any comments > on the following questions: > > 1) Are there written or silently accepted guidelines to evaluate, choose, > and justify the choice of normalization strategies? > > 2) What could be sensible "readouts" for the performance of a given > normalization strategy ? (Personally, I am looking at the performance on > spike-in-control and a handful of known gene profiles. I am very intersted > in complementary approaches) > > 3) Is there some literature on this issue that may have escaped my notice? > > > Any comments on this issue would be highly appreciated. > > Kind regards, > > Stefan >

ADD COMMENT • link 17.5 years ago Wolfgang Huber ★ 13k

Login before adding your answer.