Huber et al: Variance stabilization ...

0

Entering edit mode

"Hüsing, Johannes" ▴ 60

@husing-johannes-172

Last seen 10.2 years ago

Hi all (and especially the 1st author of the paper Huber et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002), while shoehorning the data into a symmetric distribution is a popular motivation for transforming the data, I agree that variance stabilization is tantamount when contrasts are to be meaningfully interpreted. The practical difficulty I experience is as follows. I am investigating Affymetrix chips processed by MAS 5 (the original program, not the Bioconductor algorithm which gives roughly proportional results). As all expression values here are positive, for very small values the variance is bounded by the mean (largest when all but one chip have zero expression for the current gene). When I fit a low-parameter function to the variance-mean dependency, I obtain a negative intercept or other parameter constellations that prohibit an arsinh transformation. In this situation it seems rather like a curse than a blessing to me that negative values are verboten with the MAS 5.0 algorithm. Allowing negative values would take away the boundedness. Right now I am toying with loess fits to subsamples of the genes, and the general appearance is similar to a parabolic curve but with a broader base. I must admit that I have based my observations only on two chips, which makes a poor estimate for the variance, but with a lot of genes. The general problem of boundedness with positive values only remains though, and I'd expect negative intercepts to occur more often with more chips, as (mean, var) pairs of close to 0 would appear less often and therefore exert less influence on the regression curve. Has anyone made similar experience, and what are your suggestions? Greetings Johannes

Microarray Regression Microarray Regression • 1.2k views

ADD COMMENT • link updated 21.8 years ago by Wolfgang Huber ★ 13k • written 21.8 years ago by "Hüsing, Johannes" ▴ 60

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 3 months ago

EMBL European Molecular Biology Laborat…

Hi, one possible answer to your question is that MAS 5.0 is evil and that it's better to use almost any preprocessing strategy, like RMA, dChip, MAS 4.0. The variance-stabilizing transformation that we have proposed, as you rightly remarked, only works with data that has a strictly positive, roughly quadratic variance-mean dependence. That seems to exclude data produced with MAS 5.0. There seem also to be other unpleasant effects associated with MAS 5.0. For example, their use of abruptly different rules for the processing of the probe intensities, dependent on the continuous values PM and MM, seems to really mess up the distribution of the data across different chips. (I.e, even if the intensities of a probe across different chips were normally distributed, their resulting expression values could have a rather ugly distribution.) Best regards - Wolfgang Division of Molecular Genome Analysis German Cancer Research Center (DKFZ) Im Neuenheimer Feld 580 69120 Heidelberg, Germany w.huber@dkfz.de http://www.dkfz.de/mga/whuber Tel +49-6221-424709 Fax +49-6221-42524709 > -----Original Message----- > From: bioconductor-admin@stat.math.ethz.ch > [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Hüsing, > Johannes > Sent: Wednesday, February 12, 2003 4:07 PM > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Huber et al: Variance stabilization ... > > > Hi all (and especially the 1st author of the paper Huber et al. Variance > stabilization applied to microarray data calibration and to the > quantification of differential expression. Bioinformatics 2002), > > while shoehorning the data into a symmetric distribution is a popular > motivation for transforming the data, I agree that variance > stabilization is tantamount when contrasts are to be meaningfully > interpreted. > > The practical difficulty I experience is as follows. I am investigating > Affymetrix chips processed by MAS 5 (the original program, not the > Bioconductor algorithm which gives roughly proportional results). As all > expression values here are positive, for very small values the variance > is bounded by the mean (largest when all but one chip have zero > expression for the current gene). When I fit a low-parameter function to > the variance-mean dependency, I obtain a negative intercept or other > parameter constellations that prohibit an arsinh transformation. > > In this situation it seems rather like a curse than a blessing to me > that negative values are verboten with the MAS 5.0 algorithm. Allowing > negative values would take away the boundedness. > > Right now I am toying with loess fits to subsamples of the genes, and > the general appearance is similar to a parabolic curve but with a > broader base. I must admit that I have based my observations only on two > chips, which makes a poor estimate for the variance, but with a lot of > genes. The general problem of boundedness with positive values only > remains though, and I'd expect negative intercepts to occur more often > with more chips, as (mean, var) pairs of close to 0 would appear less > often and therefore exert less influence on the regression curve. > > Has anyone made similar experience, and what are your suggestions? > > Greetings > > > Johannes > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.8 years ago Wolfgang Huber ★ 13k

Login before adding your answer.