Question

multtest MTP raw P values = 0

0

Entering edit mode

Timur Shtatland ▴ 30

@timur-shtatland-2685

Last seen 10.6 years ago

Houston, Thank you very much for your advice! 1) Yes, I could reproduce both the problem and your solution with a much smaller data set (see code below). I made a toy dataset with just 1 (one) gene and 2 groups of 5 samples each, with all values in group 1 lower than values in group 2. Bootstrap with 100 and 1000 iterations gave P=0, and with 10,000 and 100,000 gave P>0, the latter values also consistent with both the exact permutation test and t-test. It would take a lot of cpu time to do B >= 10000 iterations with my real data set (and zeros would still be possible), but I think there is no other way around it. 2) I used ssmaxT procedure with augmentation to get FDR, reported as adjp. I used this method instead of Benjamini-Hochberg ("BH" in mt.rawp2adjp) because it requires less assumptions, although it is "conservative". When "BH" in mt.rawp2adjp is used, the adjp become much lower, as expected (see below). BTW, with BH, the problem I reported (same rawp values correspond to different adjp) is gone. I am not sure which of these 2 methods give more realistic FDR estimates. I could not find a consensus on multiple testing correction procedures. There are reasons not to use Benjamini-Hochberg FDR method, see for example: http://view.ncbi.nlm.nih.gov/pubmed/16644791 Jung SH, Jang W. Bioinformatics. 2006 Jul 15;22(14):1730-6. http://www.urmc.rochester.edu/smd/biostat/people/faculty/AOAS102.pdf Gordon, A., Glazko, G., Qiu, X., and Yakovlev, A. (2007). The Annals of Applied Statistics 1:179-190. Thanks again. Best regards, Timur -- Timur Shtatland, PhD Center for Molecular Imaging Research Massachusetts General Hospital 149 13th Street, Room 5408 Charlestown, MA 02129 tshtatland at mgh dot harvard dot edu ############################################################ > library(multtest) > nSam <- 5 > X <- matrix(c(1:nSam, 1:nSam+10), nrow=1) > print(X) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 2 3 4 5 11 12 13 14 15 > Y <- c(rep(0,nSam), rep(1,nSam)) > print(Y) [1] 0 0 0 0 0 1 1 1 1 1 > > seed <- 99 > for (B in c(100, 1000, 10000, 100000)) { + TTBoot <- MTP(X=X, Y=Y, test = "t.twosamp.unequalvar", alternative = "two.sided", typeone="fdr", method="ss.maxT", fdr.method="conservative", B=B, seed=seed) + MTPRes <- data.frame(PRaw=TTBoot at rawp, PAdj=TTBoot at adjp) + cat(sprintf("%d iterations:\n", B)) + print(format(MTPRes, digits=10, scientific=TRUE)) + } running bootstrap... iteration = 100 100 iterations: PRaw PAdj 1 0e+00 0e+00 running bootstrap... iteration = 100 200 300 400 500 600 700 800 900 1000 1000 iterations: PRaw PAdj 1 0e+00 0e+00 running bootstrap... iteration = 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900 10000 10000 iterations: PRaw PAdj 1 3e-04 6e-04 running bootstrap... iteration = 100 200 300... 99900 100000 100000 iterations: PRaw PAdj 1 1e-04 2e-04 > > library(exactRankTests) > perm.test(X[Y==1],X[Y==0]) 2-sample Permutation Test data: X[Y == 1] and X[Y == 0] T = 65, p-value = 0.007937 alternative hypothesis: true mu is not equal to 0 > t.test(X[Y==1],X[Y==0]) Welch Two Sample t-test data: X[Y == 1] and X[Y == 0] t = 10, df = 8, p-value = 8.488e-06 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 7.693996 12.306004 sample estimates: mean of x mean of y 13 3 ############################################################ ## same rawp, added BH: > TTMargFDR <- mt.rawp2adjp(rawp=TTBoot at rawp, proc="BH") > MTPRes <- data.frame(PRaw=TTBoot at rawp, PAdj=TTBoot at adjp, PAdjBH= TTMargFDR$adjp[order(TTMargFDR$index),-1]) > print(format(subset(MTPRes[order(MTPRes$PRaw, MTPRes$PAdj), ], PRaw < 2e-3), digits=10, scientific=TRUE)) PRaw PAdj PAdjBH 202508_s_at 0e+00 0.000000000e+00 0.000000000e+00 203130_s_at 0e+00 0.000000000e+00 0.000000000e+00 206780_at 0e+00 0.000000000e+00 0.000000000e+00 218820_at 0e+00 0.000000000e+00 0.000000000e+00 205825_at 0e+00 8.000000000e-03 0.000000000e+00 203000_at 0e+00 1.400000000e-02 0.000000000e+00 204465_s_at 0e+00 1.400000000e-02 0.000000000e+00 206282_at 0e+00 1.400000000e-02 0.000000000e+00 206502_s_at 0e+00 1.400000000e-02 0.000000000e+00 208399_s_at 0e+00 1.400000000e-02 0.000000000e+00 209598_at 0e+00 1.400000000e-02 0.000000000e+00 209755_at 0e+00 1.400000000e-02 0.000000000e+00 212805_at 0e+00 1.400000000e-02 0.000000000e+00 203001_s_at 0e+00 2.400000000e-02 0.000000000e+00 204811_s_at 0e+00 2.400000000e-02 0.000000000e+00 205174_s_at 0e+00 2.400000000e-02 0.000000000e+00 ... 218816_at 1e-03 1.000000000e+00 1.051067961e-02 219939_s_at 1e-03 1.000000000e+00 1.051067961e-02 221020_s_at 1e-03 1.000000000e+00 1.051067961e-02 ############################################################ ############################################################ On Feb 29, 2008, at 7:26 PM, Houston Gilbert wrote: Two things: 1.) The rawp digits are exactly zero because they are calculated as follows (or similarly): Here - Tn are the observed test statistics, Z is the matrix of bootstrap test statistics. rawp <- apply(Tn < Z, 1, mean) This means that if no observed test statistics are greater than the sample of resampled null test statistics (marginally, within rows), the raw p-values will be zero, not 1/B. The rawp here are very coarse estimates and really reflect the probability of seeing a value of test statistic that extreme given each marginal null distibution. This is a matter of taste I guess, and might be something for to look into. If you want non-zero p-values, I imagine the best way to increase B, since you will get more accurate resolution in the tails of your test statistics null distribution estimate. They may still just be zero, though. 2.) The procedure you are actually doing is an augmentation procedure which takes the _adjp_ calculated from the ssmaxT FWER-controlling procedure and augments them to control the FDR in one of two ways (here, "conservative"-ly"). Thus, your final FDR adjp are the FWER ssmaxT adjusted p-values calculated at the observed values of the test statistics (based off of an ecdf() call performed over column maxima of the null matrix) which then get augmented to FDR-controlling p-values. So I imagine the ssmaxT adjp to be way more smooth. The equality of adjps in the output is also often the byproduct of enforcing some kind of monotonicity constraint on the p-values themselves when going over subsets of rows. As an aside: The augmentation procedures can be conservative in practice. You may just want to try "BH" in mt.rawp2adjp and see what you get. Houston Houston Gilbert Doctoral Student in Biostatistics University of California, Berkeley houston at stat.berkeley.edu www.stat.berkeley.edu/~houston On 2/29/08, James Bullard <bullard at="" berkeley.edu=""> wrote: a post on the bioconductor mailing list - if you are not already on it Begin forwarded message: From: Timur Shtatland <tshtatland@mgh.harvard.edu> Date: February 29, 2008 10:37:28 AM PST To: bioconductor at stat.math.ethz.ch Subject: [BioC] multtest MTP raw P values = 0 Dear all, I used MTP from the multtest package to compute bootstrap based raw and adjusted t-test P values (group 1 = 7 samples, group 2 = 3 samples). Why are some raw P values in the output exactly zero? I could not change this by using scientific notation, with varying number of significant digits. In the example below, 10 digits are used; increasing this does not make a difference. I expected all P values > 0, with the minimum P value determined by the number of bootstrap iterations, here 1000. Hence the raw P value minimum should be 1e-03, as indeed the lowest *positive* P values are. Another question is: why some raw P values that are *equal* correspond to adjusted P values that are *different*? For example, raw P = 1e-03 corresponds to adjusted P = 8.4e-02, 8.6e-02, etc. I could not find the answer in the MTP help page, its vignettes, FAQ, etc. A similar, but not identical, problem was reported on this mailing list before, without a solution: https://stat.ethz.ch/pipermail/bioconductor/2007-December/020492.html Thank you for your help. Best regards, Timur -- Timur Shtatland, PhD Center for Molecular Imaging Research Massachusetts General Hospital 149 13th Street, Room 5408 Charlestown, MA 02129 tshtatland at mgh dot harvard dot edu ############################################################ B <- 1000 seed <- 99 TTBoot <- MTP(X=esetGcrmaExprsFiltered, Y=TT, test = "t.twosamp.unequalvar", alternative = "two.sided", typeone="fdr", method="ss.maxT", fdr.method="conservative", B=B, seed=seed) running bootstrap... iteration = 100 200 300 400 500 600 700 800 900 1000 print(TTBoot) Multiple Testing Procedure Object of class: MTP sample size = 10 number of hypotheses = 5413 test statistics = t.twosamp.unequalvar type I error rate = fdr nominal level alpha = 0.05 multiple testing procedure = ss.maxT Call: MTP(X = esetGcrmaExprsFiltered, Y = TT, test = "t.twosamp.unequalvar", alternative = "two.sided", typeone = "fdr", fdr.method = "conservative", B = B, method = "ss.maxT", seed = seed) Slots: Class Mode Length Dimension statistic numeric numeric 5413 estimate numeric numeric 5413 sampsize integer numeric 1 rawp numeric numeric 5413 adjp numeric numeric 5413 conf.reg array logical 0 0,0,0 cutoff matrix logical 0 0,0 reject matrix logical 5413 5413,1 nulldist matrix numeric 5413000 5413,1000 call call call 10 seed integer numeric 1 ... MTPRes <- data.frame(PRaw=TTBoot at rawp, PAdj=TTBoot at adjp) print(format(subset(MTPRes[order(MTPRes$PRaw, MTPRes$PAdj), ], PAdj < 0.1), digits=10, scientific=TRUE)) PRaw PAdj 202508_s_at 0e+00 0.000000000e+00 203130_s_at 0e+00 0.000000000e+00 206780_at 0e+00 0.000000000e+00 218820_at 0e+00 0.000000000e+00 205825_at 0e+00 8.000000000e-03 203000_at 0e+00 1.400000000e-02 204465_s_at 0e+00 1.400000000e-02 206282_at 0e+00 1.400000000e-02 206502_s_at 0e+00 1.400000000e-02 208399_s_at 0e+00 1.400000000e-02 209598_at 0e+00 1.400000000e-02 209755_at 0e+00 1.400000000e-02 212805_at 0e+00 1.400000000e-02 203001_s_at 0e+00 2.400000000e-02 204811_s_at 0e+00 2.400000000e-02 205174_s_at 0e+00 2.400000000e-02 205646_s_at 0e+00 2.400000000e-02 206051_at 0e+00 2.400000000e-02 212624_s_at 0e+00 2.800000000e-02 204035_at 0e+00 3.000000000e-02 221261_x_at 0e+00 3.200000000e-02 206915_at 0e+00 3.400000000e-02 206104_at 0e+00 4.000000000e-02 210036_s_at 0e+00 5.000000000e-02 218380_at 0e+00 5.000000000e-02 213135_at 0e+00 6.200000000e-02 204870_s_at 0e+00 6.400000000e-02 218952_at 0e+00 6.400000000e-02 205120_s_at 0e+00 6.600000000e-02 211597_s_at 0e+00 6.666666667e-02 205348_s_at 0e+00 6.800000000e-02 212190_at 0e+00 7.000000000e-02 213186_at 0e+00 7.000000000e-02 203485_at 0e+00 7.200000000e-02 203572_s_at 0e+00 8.400000000e-02 209583_s_at 0e+00 8.400000000e-02 212895_s_at 0e+00 8.400000000e-02 206001_at 0e+00 8.600000000e-02 206135_at 0e+00 8.600000000e-02 212843_at 0e+00 8.600000000e-02 204059_s_at 0e+00 8.800000000e-02 210826_x_at 0e+00 9.000000000e-02 213438_at 0e+00 9.000000000e-02 218675_at 0e+00 9.000000000e-02 201116_s_at 0e+00 9.200000000e-02 203272_s_at 0e+00 9.600000000e-02 204793_at 0e+00 9.800000000e-02 204720_s_at 1e-03 8.400000000e-02 221207_s_at 1e-03 8.400000000e-02 204540_at 1e-03 8.600000000e-02 209992_at 1e-03 8.888888889e-02 210246_s_at 1e-03 9.000000000e-02 print(dim(esetGcrmaFiltered)) Features Samples 5413 10 sessionInfo() R version 2.6.0 (2007-10-03) powerpc-apple-darwin8.10.1 locale: en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] multtest_1.18.0 affydata_1.11.3 affyQCReport_1.16.0 geneplotter_1.16.0 lattice_0.16-5 [6] annotate_1.16.1 AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 RColorBrewer_1.0-2 [11] affyPLM_1.14.0 xtable_1.5-2 simpleaffy_2.14.05 gcrma_2.10.0 matchprobes_1.10.0 [16] genefilter_1.16.0 survival_2.32 affy_1.16.0 preprocessCore_1.0.0 affyio_1.6.1 [21] Biobase_1.16.1 loaded via a namespace (and not attached): [1] KernSmooth_2.22-21 grid_2.6.0 version _ platform powerpc-apple-darwin8.10.1 arch powerpc os darwin8.10.1 system powerpc, darwin8.10.1 status major 2 minor 6.0 year 2007 month 10 day 03 svn rev 43063 language R version.string R version 2.6.0 (2007-10-03) # platform: PowerPC G4, Mac OS 10.4.11. The information transmitted in this electronic communica...{{dropped:16}}

multtest multtest • 1.5k views

ADD COMMENT • link 17.1 years ago Timur Shtatland ▴ 30