> Date: Mon, 31 Jan 2005 09:56:09 -0500
> From: Naomi Altman <naomi@stat.psu.edu>
> Subject: [BioC] limma - FDR adjusted "p-values"
> To: bioconductor@stat.math.ethz.ch
>
> Just a suggestion:
>
> The FDR adjusted "p-values" are called "q-values" in much of the
> literature. I suggest that limma follow suit,
It's certainly true that a lot of users have trouble with FDR and with
adjusted p-values in
general. Perhaps you're right that limma should use the term
"q-values". This would associate
p-values with control/estimation of FWER and q-values with
control/estimation of FDR.
The reason I haven't this so far is because the term "q-value" coined
by John Storey seems to me
to measure something slightly different to Benjamini and Hocherg
adjusted p-values. I think that
John Storey's q-value uses a slightly different definition of false
discovery rate, namely pFDR,
the positive false rate. Also I think it usually estimates pFDR
rather than formally controlling
it. Although there is a value "Q" which appears in Benjamin and
Hochberg's formulations, and it
is closely related to q-values, it is not exactly the same. So I
have been reluctant to use the
term "q-value" for things which were not quite the same, as this would
cloud the fine meaning of
the term. Perhaps I am splitting hairs here and should just accept
the broad definition of
q-value for FDR or pFDR and p-value for FWER. Any other opinions?
I have also thought that perhaps topTable() should label the
p-value/q-value column in the output
to indicate which adjustment method was used to generate the table.
> and also add a line to the
> documentation (it might already be there and I missed it)
>
> "If the number of significant results at level alpha is less than
> alpha*(number of genes), then the q-value will be 1.0."
>
> It seems like I have to explain this to just about every
investigator who
> runs into this.
I get a lot of questions about this as well. Actually, the statement
you've made isn't always
true, although it usually is. Even if the smallest p-value out of n
genes is only as small as
1/n, the "fdr" adjusted p-value is not always 1. It can be as small
as 1/n depending on the other
n-1 p-values.
Perhaps the way to go would be for topTable() to output the raw
p-values as well as the adjusted
p-values/q-values. I haven't done this so as to keep the table as
small as possible, but it would
prevent users from being presented with just a list of p-values all
equal to 1. What do you
think?
Gordon
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
On Feb 1, 2005, at 7:30 AM, Gordon K Smyth wrote:
>> Date: Mon, 31 Jan 2005 09:56:09 -0500
>> From: Naomi Altman <naomi@stat.psu.edu>
>> Subject: [BioC] limma - FDR adjusted "p-values"
>> To: bioconductor@stat.math.ethz.ch
>>
>> Just a suggestion:
>>
>> The FDR adjusted "p-values" are called "q-values" in much of the
>> literature. I suggest that limma follow suit,
>
> It's certainly true that a lot of users have trouble with FDR and
with
> adjusted p-values in
> general. Perhaps you're right that limma should use the term
> "q-values". This would associate
> p-values with control/estimation of FWER and q-values with
> control/estimation of FDR.
>
> The reason I haven't this so far is because the term "q-value"
coined
> by John Storey seems to me
> to measure something slightly different to Benjamini and Hocherg
> adjusted p-values. I think that
> John Storey's q-value uses a slightly different definition of false
> discovery rate, namely pFDR,
> the positive false rate. Also I think it usually estimates pFDR
> rather than formally controlling
> it. Although there is a value "Q" which appears in Benjamin and
> Hochberg's formulations, and it
> is closely related to q-values, it is not exactly the same. So I
> have been reluctant to use the
> term "q-value" for things which were not quite the same, as this
would
> cloud the fine meaning of
> the term. Perhaps I am splitting hairs here and should just accept
> the broad definition of
> q-value for FDR or pFDR and p-value for FWER. Any other opinions?
>
> I have also thought that perhaps topTable() should label the
> p-value/q-value column in the output
> to indicate which adjustment method was used to generate the table.
>
I think the latter (label the p-value/q-value column) would suffice
and
be the most general solution. Unfortunately, FDR is foreign to many
researchers, so it demands an explanation by someone in-the-know, no
matter what. I'm not sure that calling a p-value a different name
will
satisfy the need for researchers to know the quantity that summarizes
their data. In short, I see the labeling issue as separate from the
FDR understanding issue. Is that fair?
Sean
>> and also add a line to the
>> documentation (it might already be there and I missed it)
>>
>> "If the number of significant results at level alpha is less than
>> alpha*(number of genes), then the q-value will be 1.0."
>>
>> It seems like I have to explain this to just about every
investigator
>> who
>> runs into this.
>
> I get a lot of questions about this as well. Actually, the
statement
> you've made isn't always
> true, although it usually is. Even if the smallest p-value out of n
> genes is only as small as
> 1/n, the "fdr" adjusted p-value is not always 1. It can be as small
> as 1/n depending on the other
> n-1 p-values.
>
> Perhaps the way to go would be for topTable() to output the raw
> p-values as well as the adjusted
> p-values/q-values. I haven't done this so as to keep the table as
> small as possible, but it would
> prevent users from being presented with just a list of p-values all
> equal to 1. What do you
> think?
>
> Gordon
>
>> Naomi S. Altman 814-865-3791 (voice)
>> Associate Professor
>> Bioinformatics Consulting Center
>> Dept. of Statistics 814-863-7114 (fax)
>> Penn State University 814-865-1348
>> (Statistics)
>> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
We use limma a lot, and from our point of view having both adjusted
and
unadjusted p-values in the topTable() output would be beneficial.
Thanks
Mick
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Gordon K
Smyth
Sent: 01 February 2005 12:31
To: Naomi Altman
Cc: jstorey@u.washington.edu; bioconductor@stat.math.ethz.ch
Subject: [BioC] limma - FDR adjusted "p-values"
> Date: Mon, 31 Jan 2005 09:56:09 -0500
> From: Naomi Altman <naomi@stat.psu.edu>
> Subject: [BioC] limma - FDR adjusted "p-values"
> To: bioconductor@stat.math.ethz.ch
>
> Just a suggestion:
>
> The FDR adjusted "p-values" are called "q-values" in much of the
> literature. I suggest that limma follow suit,
It's certainly true that a lot of users have trouble with FDR and with
adjusted p-values in general. Perhaps you're right that limma should
use the term "q-values". This would associate p-values with
control/estimation of FWER and q-values with control/estimation of
FDR.
The reason I haven't this so far is because the term "q-value" coined
by
John Storey seems to me to measure something slightly different to
Benjamini and Hocherg adjusted p-values. I think that John Storey's
q-value uses a slightly different definition of false discovery rate,
namely pFDR, the positive false rate. Also I think it usually
estimates
pFDR rather than formally controlling it. Although there is a value
"Q"
which appears in Benjamin and Hochberg's formulations, and it
is closely related to q-values, it is not exactly the same. So I
have
been reluctant to use the
term "q-value" for things which were not quite the same, as this would
cloud the fine meaning of the term. Perhaps I am splitting hairs here
and should just accept the broad definition of q-value for FDR or pFDR
and p-value for FWER. Any other opinions?
I have also thought that perhaps topTable() should label the
p-value/q-value column in the output to indicate which adjustment
method
was used to generate the table.
> and also add a line to the
> documentation (it might already be there and I missed it)
>
> "If the number of significant results at level alpha is less than
> alpha*(number of genes), then the q-value will be 1.0."
>
> It seems like I have to explain this to just about every
investigator
> who runs into this.
I get a lot of questions about this as well. Actually, the statement
you've made isn't always true, although it usually is. Even if the
smallest p-value out of n genes is only as small as 1/n, the "fdr"
adjusted p-value is not always 1. It can be as small as 1/n depending
on the other n-1 p-values.
Perhaps the way to go would be for topTable() to output the raw
p-values
as well as the adjusted p-values/q-values. I haven't done this so as
to
keep the table as small as possible, but it would prevent users from
being presented with just a list of p-values all equal to 1. What do
you think?
Gordon
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
I think it would be useful to have both the p-values and the
"q-values". The "q-values" should not be called "adjusted p-values"
because they are not probabilities. They are the estimated FDR at the
largest p-value for which the gene would be statistically
significant. Perhaps they should be called "fdr-values".
My vote is for Gordon to invent a name and then use it. As LIMMA
becomes
more popular, the terminology will migrate to popular usage.
Cheers,
Naomi
At 07:30 AM 2/1/2005, Gordon K Smyth wrote:
> > Date: Mon, 31 Jan 2005 09:56:09 -0500
> > From: Naomi Altman <naomi@stat.psu.edu>
> > Subject: [BioC] limma - FDR adjusted "p-values"
> > To: bioconductor@stat.math.ethz.ch
> >
> > Just a suggestion:
> >
> > The FDR adjusted "p-values" are called "q-values" in much of the
> > literature. I suggest that limma follow suit,
>
>It's certainly true that a lot of users have trouble with FDR and
with
>adjusted p-values in
>general. Perhaps you're right that limma should use the term
>"q-values". This would associate
>p-values with control/estimation of FWER and q-values with
>control/estimation of FDR.
>
>The reason I haven't this so far is because the term "q-value" coined
by
>John Storey seems to me
>to measure something slightly different to Benjamini and Hocherg
adjusted
>p-values. I think that
>John Storey's q-value uses a slightly different definition of false
>discovery rate, namely pFDR,
>the positive false rate. Also I think it usually estimates pFDR
rather
>than formally controlling
>it. Although there is a value "Q" which appears in Benjamin and
>Hochberg's formulations, and it
>is closely related to q-values, it is not exactly the same. So I
have
>been reluctant to use the
>term "q-value" for things which were not quite the same, as this
would
>cloud the fine meaning of
>the term. Perhaps I am splitting hairs here and should just accept
the
>broad definition of
>q-value for FDR or pFDR and p-value for FWER. Any other opinions?
>
>I have also thought that perhaps topTable() should label the
>p-value/q-value column in the output
>to indicate which adjustment method was used to generate the table.
>
> > and also add a line to the
> > documentation (it might already be there and I missed it)
> >
> > "If the number of significant results at level alpha is less than
> > alpha*(number of genes), then the q-value will be 1.0."
> >
> > It seems like I have to explain this to just about every
investigator who
> > runs into this.
>
>I get a lot of questions about this as well. Actually, the statement
>you've made isn't always
>true, although it usually is. Even if the smallest p-value out of n
genes
>is only as small as
>1/n, the "fdr" adjusted p-value is not always 1. It can be as small
as
>1/n depending on the other
>n-1 p-values.
>
>Perhaps the way to go would be for topTable() to output the raw
p-values
>as well as the adjusted
>p-values/q-values. I haven't done this so as to keep the table as
small
>as possible, but it would
>prevent users from being presented with just a list of p-values all
equal
>to 1. What do you
>think?
>
>Gordon
>
> > Naomi S. Altman 814-865-3791
(voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics 814-863-7114
(fax)
> > Penn State University 814-865-1348
(Statistics)
> > University Park, PA 16802-2111
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111