Limma package: problem with p-value and t

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 11 months ago

United States

Dear Gordon, I noticed that there is inconsistency between the t and p.value on the topTable output using Limma package to analyze a dye swap two channel array experiment. logFC AveExpr t P.Value adj.P.Val B 1401 3.216254 11.74137 22.49546 1.041470e-05 0.07874703 3.435097 1537 3.344735 11.88007 18.07448 2.702061e-05 0.07874703 2.951114 The two-tailed p-value for t=22.49546 with df=1 would be 0.028 2*pt(-abs(22.49546),df=1), but the P.Value in the above output is 1.04e-05. Did I miss something? Could you please explain the inconsistency? Thanks! Here is the design matrix and relevant code snippets. Dye mutant 1 -1 1 -1 1 1 1 1 design = cbind(Dye=1, mutant = c(-1,-1,1,1)) fit = lmFit(MA, design) ### MA is a MAlist object fit3 = eBayes(fit) topTable(fit3, number=2, coef="mutant", adjust="fdr") Here is my R session information. R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_2.18.0 Thank you very much for your help! Best regards, Julie

limma limma • 663 views

ADD COMMENT • link updated 15.1 years ago by James W. MacDonald 67k • written 15.1 years ago by Julie Zhu ★ 4.3k

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Julie Zhu wrote: > Dear Gordon, > > I noticed that there is inconsistency between the t and p.value on the > topTable output using Limma package to analyze a dye swap two channel array > experiment. > logFC AveExpr t P.Value adj.P.Val B > 1401 3.216254 11.74137 22.49546 1.041470e-05 0.07874703 3.435097 > 1537 3.344735 11.88007 18.07448 2.702061e-05 0.07874703 2.951114 > > The two-tailed p-value for t=22.49546 with df=1 would be 0.028 > 2*pt(-abs(22.49546),df=1), but the P.Value in the above output is 1.04e-05. > Did I miss something? Could you please explain the inconsistency? Thanks! The degrees of freedom will be somewhere around 3, but a bit larger because of the ebayes fit. You can get the actual value from your MArrayLM object, so what you really want is 2*pt(-abs(22.49546), fit$df.prior + fit$df.residual[1]) assuming you did something like fit <- lmFit(<stuff>) fit <- eBayes(fit) Best, Jim > > Here is the design matrix and relevant code snippets. > > Dye mutant > 1 -1 > 1 -1 > 1 1 > 1 1 > > design = cbind(Dye=1, mutant = c(-1,-1,1,1)) > fit = lmFit(MA, design) > ### MA is a MAlist object > fit3 = eBayes(fit) > topTable(fit3, number=2, coef="mutant", adjust="fdr") > > Here is my R session information. > > R version 2.9.0 (2009-04-17) > i386-apple-darwin8.11.1 > > locale: > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.18.0 > > Thank you very much for your help! > > Best regards, > > Julie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD COMMENT • link 15.1 years ago James W. MacDonald 67k

0

Entering edit mode

Hi James, Thanks you so much for helping me solving the mystery! Could you please explain why the degree freedom is so large? I only have 4 arrays, so total 3 df (n-1). One is used for estimating dye effect, 1 is used for estimating mutant effect, so the residual is 1 df. Why is there an additional of 2.397083 df added? Is it because the information borrowing from the neighboring genes? How is the df.prior calculated? Thanks so much for sharing your knowledge! Best regards, Julie On 9/17/09 4:50 PM, "James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: > > > Julie Zhu wrote: >> Dear Gordon, >> >> I noticed that there is inconsistency between the t and p.value on the >> topTable output using Limma package to analyze a dye swap two channel array >> experiment. >> logFC AveExpr t P.Value adj.P.Val B >> 1401 3.216254 11.74137 22.49546 1.041470e-05 0.07874703 3.435097 >> 1537 3.344735 11.88007 18.07448 2.702061e-05 0.07874703 2.951114 >> >> The two-tailed p-value for t=22.49546 with df=1 would be 0.028 >> 2*pt(-abs(22.49546),df=1), but the P.Value in the above output is 1.04e-05. >> Did I miss something? Could you please explain the inconsistency? Thanks! > > The degrees of freedom will be somewhere around 3, but a bit larger > because of the ebayes fit. You can get the actual value from your > MArrayLM object, so what you really want is > > 2*pt(-abs(22.49546), fit$df.prior + fit$df.residual[1]) > > assuming you did something like > > fit <- lmFit(<stuff>) > fit <- eBayes(fit) > > Best, > > Jim > > >> >> Here is the design matrix and relevant code snippets. >> >> Dye mutant >> 1 -1 >> 1 -1 >> 1 1 >> 1 1 >> >> design = cbind(Dye=1, mutant = c(-1,-1,1,1)) >> fit = lmFit(MA, design) >> ### MA is a MAlist object >> fit3 = eBayes(fit) >> topTable(fit3, number=2, coef="mutant", adjust="fdr") >> >> Here is my R session information. >> >> R version 2.9.0 (2009-04-17) >> i386-apple-darwin8.11.1 >> >> locale: >> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_2.18.0 >> >> Thank you very much for your help! >> >> Best regards, >> >> Julie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie, Your calculation of df is off. You lose one df for each column of your design matrix, so you have 2 residual df. The computation of the prior df is outlined (I believe) in the first paper cited in the limma User's Guide: http://www.bepress.com/sagmb/vol3/iss1/art3 Best, Jim Julie Zhu wrote: > Hi James, > > Thanks you so much for helping me solving the mystery! Could you please > explain why the degree freedom is so large? I only have 4 arrays, so total 3 > df (n-1). One is used for estimating dye effect, 1 is used for estimating > mutant effect, so the residual is 1 df. Why is there an additional of > 2.397083 df added? Is it because the information borrowing from the > neighboring genes? How is the df.prior calculated? Thanks so much for > sharing your knowledge! > > Best regards, > > Julie > > > On 9/17/09 4:50 PM, "James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: > >> >> Julie Zhu wrote: >>> Dear Gordon, >>> >>> I noticed that there is inconsistency between the t and p.value on the >>> topTable output using Limma package to analyze a dye swap two channel array >>> experiment. >>> logFC AveExpr t P.Value adj.P.Val B >>> 1401 3.216254 11.74137 22.49546 1.041470e-05 0.07874703 3.435097 >>> 1537 3.344735 11.88007 18.07448 2.702061e-05 0.07874703 2.951114 >>> >>> The two-tailed p-value for t=22.49546 with df=1 would be 0.028 >>> 2*pt(-abs(22.49546),df=1), but the P.Value in the above output is 1.04e-05. >>> Did I miss something? Could you please explain the inconsistency? Thanks! >> The degrees of freedom will be somewhere around 3, but a bit larger >> because of the ebayes fit. You can get the actual value from your >> MArrayLM object, so what you really want is >> >> 2*pt(-abs(22.49546), fit$df.prior + fit$df.residual[1]) >> >> assuming you did something like >> >> fit <- lmFit(<stuff>) >> fit <- eBayes(fit) >> >> Best, >> >> Jim >> >> >>> Here is the design matrix and relevant code snippets. >>> >>> Dye mutant >>> 1 -1 >>> 1 -1 >>> 1 1 >>> 1 1 >>> >>> design = cbind(Dye=1, mutant = c(-1,-1,1,1)) >>> fit = lmFit(MA, design) >>> ### MA is a MAlist object >>> fit3 = eBayes(fit) >>> topTable(fit3, number=2, coef="mutant", adjust="fdr") >>> >>> Here is my R session information. >>> >>> R version 2.9.0 (2009-04-17) >>> i386-apple-darwin8.11.1 >>> >>> locale: >>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_2.18.0 >>> >>> Thank you very much for your help! >>> >>> Best regards, >>> >>> Julie >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD REPLY • link 15.1 years ago James W. MacDonald 67k

Login before adding your answer.