limma - FDR adjusted "p-values"

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 10 hours ago

WEHI, Melbourne, Australia

> Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator who > runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111

GO limma GO limma • 3.1k views

ADD COMMENT • link updated 19.7 years ago by Naomi Altman ★ 6.0k • written 19.7 years ago by Gordon Smyth 51k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 7 weeks ago

United States

On Feb 1, 2005, at 7:30 AM, Gordon K Smyth wrote: >> Date: Mon, 31 Jan 2005 09:56:09 -0500 >> From: Naomi Altman <naomi@stat.psu.edu> >> Subject: [BioC] limma - FDR adjusted "p-values" >> To: bioconductor@stat.math.ethz.ch >> >> Just a suggestion: >> >> The FDR adjusted "p-values" are called "q-values" in much of the >> literature. I suggest that limma follow suit, > > It's certainly true that a lot of users have trouble with FDR and with > adjusted p-values in > general. Perhaps you're right that limma should use the term > "q-values". This would associate > p-values with control/estimation of FWER and q-values with > control/estimation of FDR. > > The reason I haven't this so far is because the term "q-value" coined > by John Storey seems to me > to measure something slightly different to Benjamini and Hocherg > adjusted p-values. I think that > John Storey's q-value uses a slightly different definition of false > discovery rate, namely pFDR, > the positive false rate. Also I think it usually estimates pFDR > rather than formally controlling > it. Although there is a value "Q" which appears in Benjamin and > Hochberg's formulations, and it > is closely related to q-values, it is not exactly the same. So I > have been reluctant to use the > term "q-value" for things which were not quite the same, as this would > cloud the fine meaning of > the term. Perhaps I am splitting hairs here and should just accept > the broad definition of > q-value for FDR or pFDR and p-value for FWER. Any other opinions? > > I have also thought that perhaps topTable() should label the > p-value/q-value column in the output > to indicate which adjustment method was used to generate the table. > I think the latter (label the p-value/q-value column) would suffice and be the most general solution. Unfortunately, FDR is foreign to many researchers, so it demands an explanation by someone in-the-know, no matter what. I'm not sure that calling a p-value a different name will satisfy the need for researchers to know the quantity that summarizes their data. In short, I see the labeling issue as separate from the FDR understanding issue. Is that fair? Sean >> and also add a line to the >> documentation (it might already be there and I missed it) >> >> "If the number of significant results at level alpha is less than >> alpha*(number of genes), then the q-value will be 1.0." >> >> It seems like I have to explain this to just about every investigator >> who >> runs into this. > > I get a lot of questions about this as well. Actually, the statement > you've made isn't always > true, although it usually is. Even if the smallest p-value out of n > genes is only as small as > 1/n, the "fdr" adjusted p-value is not always 1. It can be as small > as 1/n depending on the other > n-1 p-values. > > Perhaps the way to go would be for topTable() to output the raw > p-values as well as the adjusted > p-values/q-values. I haven't done this so as to keep the table as > small as possible, but it would > prevent users from being presented with just a list of p-values all > equal to 1. What do you > think? > > Gordon > >> Naomi S. Altman 814-865-3791 (voice) >> Associate Professor >> Bioinformatics Consulting Center >> Dept. of Statistics 814-863-7114 (fax) >> Penn State University 814-865-1348 >> (Statistics) >> University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.7 years ago Sean Davis 21k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.1 years ago

We use limma a lot, and from our point of view having both adjusted and unadjusted p-values in the topTable() output would be beneficial. Thanks Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Gordon K Smyth Sent: 01 February 2005 12:31 To: Naomi Altman Cc: jstorey@u.washington.edu; bioconductor@stat.math.ethz.ch Subject: [BioC] limma - FDR adjusted "p-values" > Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator > who runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.7 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.5 years ago

United States

I think it would be useful to have both the p-values and the "q-values". The "q-values" should not be called "adjusted p-values" because they are not probabilities. They are the estimated FDR at the largest p-value for which the gene would be statistically significant. Perhaps they should be called "fdr-values". My vote is for Gordon to invent a name and then use it. As LIMMA becomes more popular, the terminology will migrate to popular usage. Cheers, Naomi At 07:30 AM 2/1/2005, Gordon K Smyth wrote: > > Date: Mon, 31 Jan 2005 09:56:09 -0500 > > From: Naomi Altman <naomi@stat.psu.edu> > > Subject: [BioC] limma - FDR adjusted "p-values" > > To: bioconductor@stat.math.ethz.ch > > > > Just a suggestion: > > > > The FDR adjusted "p-values" are called "q-values" in much of the > > literature. I suggest that limma follow suit, > >It's certainly true that a lot of users have trouble with FDR and with >adjusted p-values in >general. Perhaps you're right that limma should use the term >"q-values". This would associate >p-values with control/estimation of FWER and q-values with >control/estimation of FDR. > >The reason I haven't this so far is because the term "q-value" coined by >John Storey seems to me >to measure something slightly different to Benjamini and Hocherg adjusted >p-values. I think that >John Storey's q-value uses a slightly different definition of false >discovery rate, namely pFDR, >the positive false rate. Also I think it usually estimates pFDR rather >than formally controlling >it. Although there is a value "Q" which appears in Benjamin and >Hochberg's formulations, and it >is closely related to q-values, it is not exactly the same. So I have >been reluctant to use the >term "q-value" for things which were not quite the same, as this would >cloud the fine meaning of >the term. Perhaps I am splitting hairs here and should just accept the >broad definition of >q-value for FDR or pFDR and p-value for FWER. Any other opinions? > >I have also thought that perhaps topTable() should label the >p-value/q-value column in the output >to indicate which adjustment method was used to generate the table. > > > and also add a line to the > > documentation (it might already be there and I missed it) > > > > "If the number of significant results at level alpha is less than > > alpha*(number of genes), then the q-value will be 1.0." > > > > It seems like I have to explain this to just about every investigator who > > runs into this. > >I get a lot of questions about this as well. Actually, the statement >you've made isn't always >true, although it usually is. Even if the smallest p-value out of n genes >is only as small as >1/n, the "fdr" adjusted p-value is not always 1. It can be as small as >1/n depending on the other >n-1 p-values. > >Perhaps the way to go would be for topTable() to output the raw p-values >as well as the adjusted >p-values/q-values. I haven't done this so as to keep the table as small >as possible, but it would >prevent users from being presented with just a list of p-values all equal >to 1. What do you >think? > >Gordon > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 19.7 years ago Naomi Altman ★ 6.0k

Login before adding your answer.