median polish vs mas
2
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.6 years ago
United States
I have been wondering why the default in justRMA is summary.method="medianpolish" instead of "mas" which is Tukey's biweight. Since we are already doing quantile normalization, doesn't the extra between array step imposed by median polish give the possibility of masking differential expression? Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
Normalization Normalization • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 minutes ago
United States
The default (and only) option for justRMA is medianpolish because justRMA is designed to *just* do *RMA*, which is a quantile normalization followed by medianpolish. The only reason justRMA exists is to allow people with less RAM to be able to do rma. If you think a quantile normalization followed by Tukey's biweight will do better than rma, you can certainly do that using the expresso() function. Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Naomi Altman <naomi@stat.psu.edu> 05/17/04 10:15AM >>> I have been wondering why the default in justRMA is summary.method="medianpolish" instead of "mas" which is Tukey's biweight. Since we are already doing quantile normalization, doesn't the extra between array step imposed by median polish give the possibility of masking differential expression? Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
An HTML attachment was scrubbed... URL: https://www.stat.math.ethz.ch/pipermail/bioconductor/attachments/ 20040517/46ff57d7/attachment.html
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 minutes ago
United States
Dear Naomi, I think we are talking about two different things here. Your question appears to be whether or not rma is a reasonable method for computing expression values, and you appear not to distinguish between justRMA and rma. My statement is directed towards the purpose of justRMA. To answer your question, I personally like rma, and I am not convinced that there is any over-normalization occuring by doing a quantile normalization followed by medianpolish. I have tried pretty much everything out there, and I have yet to find a method for computing expression values that I think does a better job in general use. This is based primarily on how well a given method works with the affy spike- in and GeneLogic dilution data sets (I have had arguments with other statisticians who think that rma only works as well as it does with these data sets because it has been specifically 'tuned' for them. If so, my hat is off to Rafael and Ben for their ability to come up with an algorithm that can magically pick the 16 spiked-in genes out of the other 18,000 or so other genes...). For a variety of reasons, not the least of which is the fact that rma 'beats' most other methods, rma has sort of become the canonical method for computing expression values for Affy data. It has been implemented in other non-BioC packages such as GeneSpring, etc, and although I haven't seen anything concrete, I would bet dollars to donuts that the Affy PLIER algorithm is simply rma by another name. I think this is why your reviewer wants to know why you are doing quantile normalization followed by Tukey's biweight instead of what he/she would consider to be the 'usual' method. Now to the point I was originally trying to make. One of the problems that people encounter with rma is the fact that you first have to create an AffyBatch with all of your chips, and then compute expression values which are stored in an exprSet. This can take a huge amount of RAM, and people with maybe 512 Mb of RAM (which is plenty for the vast majority of things you will ever do on a computer) were running out of memory with a relatively small number of chips. Rafael noted that a modification could be made to rma that would use much less memory, and with his help I wrote the original justRMA. This function was designed for one purpose only; to allow people with less RAM to be able to do rma. The decision to use medianpolish wasn't arbitrary at all; justRMA is designed to give the exact same results as rma (which of course uses medianpolish to compute expression values), so by default I had to use medianpolish. Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Naomi Altman <naomi@stat.psu.edu> 05/17/04 11:33AM >>> Dear Jim, The reason I ask is that I have been using expresso with "mas". But I recently had a paper returned with the comment that median polish was "known to be better". If so, I should probably use it. The reviewer appears to have based his/her remarks on the fact (mentioned in the review) that median polish is the "default". If the decision to use median polish in justRMA was arbitrary, I would like to know this, since I am currently in the process of redoing all of the statistical analyses and tables in the paper (which is pretty time-consuming). The main reason we are redoing everything, rather than defending our decision to use "mas" is that I certainly have no evidence that Tukey's biweight is "better" except for the heuristic about over-normalization, and I figured in the long run we will have fewer arguments with reviewers if we use the default. I should not have said that median polish is the "default" in justRMA, since it is the only method available, but I do think that its use in justRMA is an endorsement meaning that anyone doing anything besides Affy-type MAS5 or justRMA or justGCRMA (if this is available) is going to be asked to justify what they are doing with more stringency. --Naomi At 10:49 AM 5/17/2004, James MacDonald wrote: The default (and only) option for justRMA is medianpolish because justRMA is designed to *just* do *RMA*, which is a quantile normalization followed by medianpolish. The only reason justRMA exists is to allow people with less RAM to be able to do rma. If you think a quantile normalization followed by Tukey's biweight will do better than rma, you can certainly do that using the expresso() function. Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Naomi Altman <naomi@stat.psu.edu> 05/17/04 10:15AM >>> I have been wondering why the default in justRMA is summary.method="medianpolish" instead of "mas" which is Tukey's biweight. Since we are already doing quantile normalization, doesn't the extra between array step imposed by median polish give the possibility of masking differential expression? Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD COMMENT
0
Entering edit mode
a couple of points to add to jim's response 1- robust fits of linear models in the log scale offer somthing quantile normalization does not and it is the removal of outliers that appear to be outliers only when one looks across chip ( see li and wong's pnas 2001 paper). median polish is a quick and dirty way of doing this. if you want something fancier you can use the affyplm package to perform formal robust procedures. i havent found a procedure that clearly beats median polish as judged by affycomp. the key is that one fits multiple ararays and takes advatange of the probe effect to find outliers. 2-rma was tuned to the genelogic spike-in. to avoid the effect of over training we assessed rma on the genelogic dilution and affymetix spike-in experiments. On Mon, 17 May 2004, James MacDonald wrote: > Dear Naomi, > > I think we are talking about two different things here. Your question > appears to be whether or not rma is a reasonable method for computing > expression values, and you appear not to distinguish between justRMA and > rma. My statement is directed towards the purpose of justRMA. > > To answer your question, I personally like rma, and I am not convinced > that there is any over-normalization occuring by doing a quantile > normalization followed by medianpolish. I have tried pretty much > everything out there, and I have yet to find a method for computing > expression values that I think does a better job in general use. This is > based primarily on how well a given method works with the affy spike-in > and GeneLogic dilution data sets (I have had arguments with other > statisticians who think that rma only works as well as it does with > these data sets because it has been specifically 'tuned' for them. If > so, my hat is off to Rafael and Ben for their ability to come up with an > algorithm that can magically pick the 16 spiked-in genes out of the > other 18,000 or so other genes...). > > For a variety of reasons, not the least of which is the fact that rma > 'beats' most other methods, rma has sort of become the canonical method > for computing expression values for Affy data. It has been implemented > in other non-BioC packages such as GeneSpring, etc, and although I > haven't seen anything concrete, I would bet dollars to donuts that the > Affy PLIER algorithm is simply rma by another name. I think this is why > your reviewer wants to know why you are doing quantile normalization > followed by Tukey's biweight instead of what he/she would consider to be > the 'usual' method. > > Now to the point I was originally trying to make. One of the problems > that people encounter with rma is the fact that you first have to create > an AffyBatch with all of your chips, and then compute expression values > which are stored in an exprSet. This can take a huge amount of RAM, and > people with maybe 512 Mb of RAM (which is plenty for the vast majority > of things you will ever do on a computer) were running out of memory > with a relatively small number of chips. Rafael noted that a > modification could be made to rma that would use much less memory, and > with his help I wrote the original justRMA. This function was designed > for one purpose only; to allow people with less RAM to be able to do > rma. > > The decision to use medianpolish wasn't arbitrary at all; justRMA is > designed to give the exact same results as rma (which of course uses > medianpolish to compute expression values), so by default I had to use > medianpolish. > > Best, > > Jim > > > > James W. MacDonald > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > >>> Naomi Altman <naomi@stat.psu.edu> 05/17/04 11:33AM >>> > Dear Jim, > > The reason I ask is that I have been using expresso with "mas". But I > recently had a paper returned with the comment that median polish was > "known to be better". If so, I should probably use it. The reviewer > appears to have based his/her remarks on the fact (mentioned in the > review) that median polish is the "default". > > If the decision to use median polish in justRMA was arbitrary, I would > like to know this, since I am currently in the process of redoing all of > the statistical analyses and tables in the paper (which is pretty > time-consuming). The main reason we are redoing everything, rather than > defending our decision to use "mas" is that I certainly have no evidence > that Tukey's biweight is "better" except for the heuristic about > over-normalization, and I figured in the long run we will have fewer > arguments with reviewers if we use the default. > > I should not have said that median polish is the "default" in justRMA, > since it is the only method available, but I do think that its use in > justRMA is an endorsement meaning that anyone doing anything besides > Affy-type MAS5 or justRMA or justGCRMA (if this is available) is going > to be asked to justify what they are doing with more stringency. > > --Naomi > > > > At 10:49 AM 5/17/2004, James MacDonald wrote: > The default (and only) option for justRMA is medianpolish because > justRMA is designed to *just* do *RMA*, which is a quantile > normalization followed by medianpolish. The only reason justRMA exists > is to allow people with less RAM to be able to do rma. > > If you think a quantile normalization followed by Tukey's biweight > will > do better than rma, you can certainly do that using the expresso() > function. > > Best, > > Jim > > > James W. MacDonald > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > >>> Naomi Altman <naomi@stat.psu.edu> 05/17/04 10:15AM >>> > I have been wondering why the default in justRMA is > summary.method="medianpolish" instead of "mas" which is Tukey's > biweight. Since we are already doing quantile normalization, doesn't > the > extra between array step imposed by median polish give the possibility > of > masking differential expression? > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 > (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6