Question

multtest Paired T-test

0

Entering edit mode

kpollard ▴ 110

@kpollard-7578

Last seen 9.5 years ago

United States

Ken recently posted a question to the list regarding computation of t- and F-statistics (and corresponding unadjusted p-values) in multtest. Here are some suggestions. Ken's data set consists of the following: m: a 22277x6 matrix of probeset expressions m.cl=c(0,0,0,1,1,1): the corresponding class label m_pt: the 22277x6 matrix m with columns rearranged for paired tests m_pt.cl=c(0,1,0,1,0,1): the corresponding class label for paired tests (presumably m_pt=m[,c(1,4,2,5,3,6)]) He is doing two sample tests, but wants to run the F tests (typically used to compare k>2 groups) for practice. Ken computes the test statistics for each possible test available using the function mt.teststat: regular and rank-based (nonpara argument) for each value of the test argument. teststat_t_para <- mt.teststat(m,m.cl,test="t",nonpara="n") teststat_t_nonpara <- mt.teststat(m,m.cl,test="t",nonpara="y") teststat_te_para <- mt.teststat(m,m.cl,test="t.equalvar",nonpara="n") teststat_te_nonpara <- mt.teststat(m,m.cl,test="t.equalvar",nonpara="y") teststat_w_para <- mt.teststat(m,m.cl,test="wilcoxon",nonpara="n") teststat_w_nonpara <- mt.teststat(m,m.cl,test="wilcoxon",nonpara="y") teststat_f_para <- mt.teststat(m,m.cl,test="f",nonpara="n") teststat_f_nonpara <- mt.teststat(m,m.cl,test="f",nonpara="y") teststat_pt_para <- mt.teststat(m_pt,m_pt.cl,test="pairt",nonpara="n") teststat_pt_nonpara <- mt.teststat(m_pt,m_pt.cl,test="pairt",nonpara="y") teststat_bf_para <- mt.teststat(m_pt,m_pt.cl,test="blockf",nonpara="n") teststat_bf_nonpara <- mt.teststat(m_pt,m_pt.cl,test="blockf",nonpara="y") These look OK, except that for the "blockf" tests, Ken might want to use m and m_bf.cl=c(0,1,2,0,1,2), since presumably he is thinking about the two groups as two blocks (each with 3 observations). As written, with m_pt, the "blockf" test corresponds to looking at three blocks each with two observations. Note that in the way Ken has this now, the square of the paired t-statistic is equal to the block F-statistic (for three blocks of two observations). Next, Ken wants to use tabled distributions (with the functions pt, pwilcox, and pf) to compute marginal, unadjusted p-values for each gene based on the above test statistics. Assuming two-tailed tests, he proposes the following (which isn't quite right): rawp0_teststat_t_para <- 2 * (1 - pt(abs(teststat_t_para), df =5)) rawp0_teststat_t_nonpara <- 2 * (1 - pt(abs(teststat_t_nonpara)), df=5) rawp0_teststat_te_para <- 2 * (1 - pt(abs(teststat_te_para)), df=5) rawp0_teststat_te_nonpara <- 2 * (1 - pt(abs(teststat_te_nonpara)), df=5) rawp0_teststat_w_para <- 2 * (1 - pwilcox(abs(teststat_w_para)), df=5) rawp0_teststat_w_nonpara <- 2 * (1 - pwilcox(abs(teststat_w_nonpara)), df=5) rawp0_teststat_f_para <- 2 * (1 - pf(abs(teststat_f_para)), df=5) rawp0_teststat_f_nonpara <- 2 * (1 - pf(abs(teststat_f_nonpara)), df=5) rawp0_teststat_pt_para <- 2 * (1 - pt(abs(teststat_pt_para)), df=5) rawp0_teststat_pt_nonpara <- 2 * (1 - pt(abs(teststat_pt_nonpara)), df=5) rawp0_teststat_bf_para <- 2 * (1 - pf(abs(teststat_bf_para)), df=5) rawp0_teststat_bf_nonpara <- 2 * (1 - pf(abs(teststat_bf_nonpara)), df=5) Here are some suggetions: 1. Typo: For the nonpara ones, the parens are in the wrong place. They should be the same as for the para ones, eg: rawp0_teststat_t_nonpara <- 2 * (1 - pt(abs(teststat_t_nonpara), df=5)) 2. Degrees of freedom: For t-tests with equal variance (t.equalvar), df=n1+n2-2=4. For unequal variances, the df formula is more complicated (Welch - or Satterthwaite - approximation). For the paired t-test, df=2. For F-tests, you need two dfs (see ? pf). For pwilcox, you just give sample sizes in each group, not df (see ? pwilcox). 3. You don't need abs() for F-tests: the statistics are always positive since they are sums of squares. (It doesn't hurt though). With the correct rawp0 values in hand, Ken could use the function mt.rawp2adjp to adjust for multiple comparisons. Let me add that while this is a good exercise in understanding marginal p-values, the multtest package is designed to compute these without using these tabled distributions (and making the corresponding assumptions). The functions MTP and mt.minP (or mt.maxT) provide p-values (raw and adjusted) through bootstrap and permutation resampling. These p-values account for correlations between genes and do not make assumptions about the parametric forms of the distributions for each test statistic. Hence, one does not have to guess degrees of freedom or distributions. The imput is simply the data and class labels. Best, Katie

multtest multtest • 1.3k views

ADD COMMENT • link 19.7 years ago kpollard ▴ 110