Question

Westfall and Young "maxT"

0

Entering edit mode

Douglas Grove ▴ 10

@douglas-grove-373

Last seen 10.7 years ago

Hi, I've got a question regarding the Westfall and Young "maxT" procedure (implemented in Bioconductor package multtest, function mt.maxT). If one calculates a two sample T-statistic assuming unequal variances for the groups, then the resultant statistic is only approximately T and the degrees of freedom are a function of the sample sizes and variances. So the situation is that the distributions of the T statistics calculated for different "genes" are in general *not* identical. Obviously, if one has a moderately large sample size the reference distributions for the different "genes" are all approximately normal and the difference between distributions is not anything to worry about. However, if one's sample sizes are smallish, then this could be a problem, correct? So my questions are: (1) is there anything that can be done to adjust for the differences between the distributions of the genes (I'm guessing there isn't)? and (2) if there is, does the function mt.maxT() in package multtest implement such a adjustment and (3) if there is not such an adjustment, is it still reasonable to apply this procedure to smallish samples and, if yes, is there any *real* justification for doing so. Any help is much appreciated Thanks, Doug Grove Statistical Research Associate Fred Hutchinson Cancer Research Center

Cancer multtest Cancer multtest • 1.8k views

ADD COMMENT • link updated 21.8 years ago by Sandrine Dudoit ▴ 170 • written 21.8 years ago by Douglas Grove ▴ 10

score 0 · Answer 1 · 2003-07-09

Hello, > I've got a question regarding the Westfall and Young "maxT" procedure > (implemented in Bioconductor package multtest, function mt.maxT). > > If one calculates a two sample T-statistic assuming unequal variances > for the groups, then the resultant statistic is only approximately T > and the degrees of freedom are a function of the sample sizes and > variances. So the situation is that the distributions of the T > statistics calculated for different "genes" are in general *not* > identical. Obviously, if one has a moderately large sample size > the reference distributions for the different "genes" are all > approximately normal and the difference between distributions > is not anything to worry about. However, if one's sample sizes are > smallish, then this could be a problem, correct? You are right, the test statistics often have different distributions for different genes. > So my questions are: > > (1) is there anything that can be done to adjust for the differences > between the distributions of the genes (I'm guessing there isn't)? You could use the step-down minP procedure, which first calculates unadjusted p-values for each gene and then computes adjusted p-values based on successive minima of these unadjusted p-values. > (2) if there is, does the function mt.maxT() in package multtest implement > such a adjustment The mt.minP function. > (3) if there is not such an adjustment, is it still reasonable to apply this > procedure to smallish samples and, if yes, is there any *real* justification > for doing so. The maxT procedure still provides control of the FWER when the test statistics have different distributions. The main issues in choosing between the maxT and minP procedures are: balance, power, and computational feasibility. By balance I mean that the maxT procedure may give different weights to different hypotheses, while the minP procedure puts the different hypotheses on the same footing by the p-value transformation. In terms of computation, the maxT procedure is simpler. Some of these issues are dicussed in greater detail in two recent papers which you can download from my website. Y. Ge, S. Dudoit, and T. P. Speed (2003). S. Dudoit, J. P. Shaffer, and J. C. Boldrick (2003). I hope this helps. Best regards, Sandrine ---------------------------------------------------------------------- --------- Sandrine Dudoit, Ph.D. E-mail: sandrine@stat.berkeley.edu Assistant Professor Tel: (510) 643-1108 Division of Biostatistics Fax: (510) 643-5163 School of Public Health http://www.stat.berkeley.edu/~sandrine University of California, Berkeley 140 Earl Warren Hall, #7360 Berkeley, CA 94720-7360