I just did a (very) small simulation study comparing one-way ANOVA
with
limma and SAM for various values of pi-0, and normal and t-distributed
errors, 2 replicates per treatment, 22700 genes/array. I did not
replicate
my simulations, so what I have to say here is going to be necessarily
heuristic, but there were some lessons.
1. Gene-by-gene ANOVA is not as good as limma and SAM.
2. p-values are not as good as q-values. (I used the "qvalue" package
with
limma.)
3. 2 replicates does not give you a whole lot of power, even when you
"borrow strength" by using all the genes. Most of the differentially
expressing genes were not "discovered".
The SAM d-value and limma F-value had rank correlation 99.7% for the 1
data
set where I checked this. SAM's q-value estimate is more
conservative,
but both are somewhat conservative. Most of the differences in
results
appear to be differences in the estimated q-values, which were
computed
from the p-values in limma and directly from the permutations in SAM.
I
cannot conclude from this which method is "better" but limma certainly
uses
a lot less memory and is much more convenient if you need specific
contrasts. On the other hand, SAM in Excel is very easy to use and
seems
to work just fine for ANOVA-like analysis.
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Naomi,
I'm a little confused by your posting. Let me quote parts of your
email
and then ask for clarification:
>... I did not replicate my simulations,...
Does this mean you had only one one simulation?
>1. Gene-by-gene ANOVA is not as good as limma and SAM.
What is meant by "good"?
I thought what you did in limma was gene-by-gene ANOVA?
>3. 2 replicates does not give you a whole lot of power, even when you
>"borrow strength" by using all the genes. Most of the differentially
>expressing genes were not "discovered".
Is this meant for all methods?
>SAM's q-value estimate is more conservative,
>but both are somewhat conservative. Most of the differences in
results
>appear to be differences in the estimated q-values, which were
computed
>from the p-values in limma and directly from the permutations in SAM.
Aren't q-values a form of FDR and hence a function of the
prevalence of true results?
Aren't the p-values from limma from ANOVA which are
"uniformly
most powerful" if assumptions hold? Since q-values are based on
p-values
your result would be consistent with theory.
One thing I find confusing is when a program/package name is cited
instead of the specific statistical method applied. This may seem a
minor point but it is insufficient when programs or packages have
multiple options that could be used to do the same analysis.
-.- -.. .---- .--. ..-.
Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625
Sr. Biostatistician- IS Bioinformatics Unit
Lecturer in Biostatistics (775) 254-4885 fax
Graduate School of Biomedical Sciences
University of Massachusetts Medical School, Worcester
55 Lake Avenue North stephen.baker@umassmed.edu
Worcester, MA 01655 USA
Dear Stephen,
1. I did one simulation for each of 6 conditions which were 3 levels
of
differential expression and 2 error distributions. That is why I say
this
is "heuristic".
2. Limma is gene-by-gene ANOVA with an adjusted denominator. Ordinary
ANOVA had a higher false positive and false negative rate (as
determined
from the simulation) than limma or SAM even after using the FDR
adjustment.
3. The ordinary ANOVA was poor. Limma and SAM "use all the genes" in
the
shrinkage estimate. They were more powerful in my small study than
ordinary ANOVA, but they missed most of the differentially expressing
genes.
4. I am not sure I understand your comment about q-values. The
estimate of
pi_0 was pretty good in all cases, including using the p-values from
the
ANOVA F-test. I then selected q<.01 and looked that the false
positive and
false negative rate for genes with q<.01. When SAM came up with a
smaller
list of genes than limma, I compared the q-values and found that SAM
with
q<.01 was comparable Limma with a smaller value of q. I then looked
at the
number of false positives and false negatives.
Lastly, I hope that I was clear that I was analyzing a completely
randomized one-way design. I used the default settings for one-way
ANOVA
in all of the software. For limma, this means that I use the Helmert
contrasts to obtain the ordinary and eBayes ANOVA F-tests.
--Naomi
At 01:17 PM 6/25/2004 -0400, Baker, Stephen wrote:
>Naomi,
>I'm a little confused by your posting. Let me quote parts of your
email
>and then ask for clarification:
>
>
> >... I did not replicate my simulations,...
>
> Does this mean you had only one one simulation?
>
>
> >1. Gene-by-gene ANOVA is not as good as limma and SAM.
>
> What is meant by "good"?
>
> I thought what you did in limma was gene-by-gene ANOVA?
>
>
> >3. 2 replicates does not give you a whole lot of power, even when
you
> >"borrow strength" by using all the genes. Most of the
differentially
> >expressing genes were not "discovered".
>
> Is this meant for all methods?
>
> >SAM's q-value estimate is more conservative,
> >but both are somewhat conservative. Most of the differences in
results
>
> >appear to be differences in the estimated q-values, which were
computed
>
> >from the p-values in limma and directly from the permutations in
SAM.
>
> Aren't q-values a form of FDR and hence a function of the
>prevalence of true results?
>
> Aren't the p-values from limma from ANOVA which are
"uniformly
>most powerful" if assumptions hold? Since q-values are based on
p-values
>your result would be consistent with theory.
>
>
>One thing I find confusing is when a program/package name is cited
>instead of the specific statistical method applied. This may seem a
>minor point but it is insufficient when programs or packages have
>multiple options that could be used to do the same analysis.
>
>-.- -.. .---- .--. ..-.
>Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625
>Sr. Biostatistician- IS Bioinformatics Unit
>Lecturer in Biostatistics (775) 254-4885 fax
>Graduate School of Biomedical Sciences
>University of Massachusetts Medical School, Worcester
>55 Lake Avenue North stephen.baker@umassmed.edu
>Worcester, MA 01655 USA
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111