ANOVA, SAM and Limma

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.8 years ago

United States

I just did a (very) small simulation study comparing one-way ANOVA with limma and SAM for various values of pi-0, and normal and t-distributed errors, 2 replicates per treatment, 22700 genes/array. I did not replicate my simulations, so what I have to say here is going to be necessarily heuristic, but there were some lessons. 1. Gene-by-gene ANOVA is not as good as limma and SAM. 2. p-values are not as good as q-values. (I used the "qvalue" package with limma.) 3. 2 replicates does not give you a whole lot of power, even when you "borrow strength" by using all the genes. Most of the differentially expressing genes were not "discovered". The SAM d-value and limma F-value had rank correlation 99.7% for the 1 data set where I checked this. SAM's q-value estimate is more conservative, but both are somewhat conservative. Most of the differences in results appear to be differences in the estimated q-values, which were computed from the p-values in limma and directly from the permutations in SAM. I cannot conclude from this which method is "better" but limma certainly uses a lot less memory and is much more convenient if you need specific contrasts. On the other hand, SAM in Excel is very easy to use and seems to work just fine for ANOVA-like analysis. Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

limma limma • 2.4k views

ADD COMMENT • link 20.6 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Baker, Stephen ▴ 160

@baker-stephen-469

Last seen 10.4 years ago

Naomi, I'm a little confused by your posting. Let me quote parts of your email and then ask for clarification: >... I did not replicate my simulations,... Does this mean you had only one one simulation? >1. Gene-by-gene ANOVA is not as good as limma and SAM. What is meant by "good"? I thought what you did in limma was gene-by-gene ANOVA? >3. 2 replicates does not give you a whole lot of power, even when you >"borrow strength" by using all the genes. Most of the differentially >expressing genes were not "discovered". Is this meant for all methods? >SAM's q-value estimate is more conservative, >but both are somewhat conservative. Most of the differences in results >appear to be differences in the estimated q-values, which were computed >from the p-values in limma and directly from the permutations in SAM. Aren't q-values a form of FDR and hence a function of the prevalence of true results? Aren't the p-values from limma from ANOVA which are "uniformly most powerful" if assumptions hold? Since q-values are based on p-values your result would be consistent with theory. One thing I find confusing is when a program/package name is cited instead of the specific statistical method applied. This may seem a minor point but it is insufficient when programs or packages have multiple options that could be used to do the same analysis. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- IS Bioinformatics Unit Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA

ADD COMMENT • link 20.6 years ago Baker, Stephen ▴ 160

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.8 years ago

United States

Dear Stephen, 1. I did one simulation for each of 6 conditions which were 3 levels of differential expression and 2 error distributions. That is why I say this is "heuristic". 2. Limma is gene-by-gene ANOVA with an adjusted denominator. Ordinary ANOVA had a higher false positive and false negative rate (as determined from the simulation) than limma or SAM even after using the FDR adjustment. 3. The ordinary ANOVA was poor. Limma and SAM "use all the genes" in the shrinkage estimate. They were more powerful in my small study than ordinary ANOVA, but they missed most of the differentially expressing genes. 4. I am not sure I understand your comment about q-values. The estimate of pi_0 was pretty good in all cases, including using the p-values from the ANOVA F-test. I then selected q<.01 and looked that the false positive and false negative rate for genes with q<.01. When SAM came up with a smaller list of genes than limma, I compared the q-values and found that SAM with q<.01 was comparable Limma with a smaller value of q. I then looked at the number of false positives and false negatives. Lastly, I hope that I was clear that I was analyzing a completely randomized one-way design. I used the default settings for one-way ANOVA in all of the software. For limma, this means that I use the Helmert contrasts to obtain the ordinary and eBayes ANOVA F-tests. --Naomi At 01:17 PM 6/25/2004 -0400, Baker, Stephen wrote: >Naomi, >I'm a little confused by your posting. Let me quote parts of your email >and then ask for clarification: > > > >... I did not replicate my simulations,... > > Does this mean you had only one one simulation? > > > >1. Gene-by-gene ANOVA is not as good as limma and SAM. > > What is meant by "good"? > > I thought what you did in limma was gene-by-gene ANOVA? > > > >3. 2 replicates does not give you a whole lot of power, even when you > >"borrow strength" by using all the genes. Most of the differentially > >expressing genes were not "discovered". > > Is this meant for all methods? > > >SAM's q-value estimate is more conservative, > >but both are somewhat conservative. Most of the differences in results > > >appear to be differences in the estimated q-values, which were computed > > >from the p-values in limma and directly from the permutations in SAM. > > Aren't q-values a form of FDR and hence a function of the >prevalence of true results? > > Aren't the p-values from limma from ANOVA which are "uniformly >most powerful" if assumptions hold? Since q-values are based on p-values >your result would be consistent with theory. > > >One thing I find confusing is when a program/package name is cited >instead of the specific statistical method applied. This may seem a >minor point but it is insufficient when programs or packages have >multiple options that could be used to do the same analysis. > >-.- -.. .---- .--. ..-. >Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 >Sr. Biostatistician- IS Bioinformatics Unit >Lecturer in Biostatistics (775) 254-4885 fax >Graduate School of Biomedical Sciences >University of Massachusetts Medical School, Worcester >55 Lake Avenue North stephen.baker@umassmed.edu >Worcester, MA 01655 USA > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 20.6 years ago Naomi Altman ★ 6.0k

Login before adding your answer.