SAM vs LIMMA vs EBAM

0

Entering edit mode

Wu, Xiwei ▴ 350

@wu-xiwei-1102

Last seen 10.2 years ago

Hi, BioC Members, I have a general question on identifying DE genes. Since there are many ways to do this, I am wondering whether people has compared methods such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of course, they have different assumptions and different models, but should they always give similar results (assuming the parameter settings are optimized to get similar number of DE genes)? Is it better to get a common list of genes using three different methods? Do I have more confidence on this common list of genes than using a single method? Xiwei "EMF <coh.org>" made the following annotations. ---------------------------------------------------------------------- -------- SECURITY/CONFIDENTIALITY WARNING: This message and any atta...{{dropped}}

limma limma • 2.0k views

ADD COMMENT • link 19.7 years ago Wu, Xiwei ▴ 350

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.6 years ago

United States

I have not tried EBAM, but I did do this experiment with SAM and LIMMA on a data set I simulated from an actual data set. On these data, the SAM statistic and LIMMA F-test gave almost identical ordering of the genes. However, the FDR adjustment was too stringent for SAM (i.e. the true FDR was lower than SAM's estimate) and was too liberal for LIMMA. This was not a big study. I took my gene means and variances from an actual study, and then added either normal or t-4 errors and a couple of levels of differential expression. The sample sizes I used were very small - 2 or 4 replicates with 22000 genes. Results were much, much, much better with 4 replicates than with 2. --Naomi At 08:48 PM 3/28/2005, Wu, Xiwei wrote: >Hi, BioC Members, > >I have a general question on identifying DE genes. Since there are many ways >to do this, I am wondering whether people has compared methods such as SAM, >EBAM, and LIMMA by applying them to the same dataset. Of course, they have >different assumptions and different models, but should they always give >similar results (assuming the parameter settings are optimized to get >similar number of DE genes)? Is it better to get a common list of genes >using three different methods? Do I have more confidence on this common list >of genes than using a single method? > >Xiwei > > >"EMF <coh.org>" made the following annotations. >--------------------------------------------------------------------- --------- >SECURITY/CONFIDENTIALITY WARNING: This message and any atta...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 19.7 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Wu, Xiwei ▴ 350

@wu-xiwei-1102

Last seen 10.2 years ago

Thanks a lot, Naomi. Your result is very interesting. I am wondering whether the number of DE genes in your simultion dataset will affect the results. I tested SAM and Limma using the same dataset (but without the knowledge of what genes should be DE). I know this is not the best way to compare different methods, but I just want to get some idea. At the level of 0.05 FDR, SAM finds a lot more DE genes than Limma. However, with some other datasets, SAM and Limma perform similarly. In addition, I also found using median FDR or mean FDR in SAM makes a big difference for some datasets, but not for others. The message I got is that there is no common answer to this question, because it depends on the datasets? Any comments? In addition, is there a guideline for the minimum number of replicates should be used with SAM? I assume that with small number of replicates, the permutaion does not mean much. Xiwei -----Original Message----- From: Naomi Altman [mailto:naomi@stat.psu.edu] Sent: Monday, March 28, 2005 8:34 PM To: Wu, Xiwei; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] SAM vs LIMMA vs EBAM I have not tried EBAM, but I did do this experiment with SAM and LIMMA on a data set I simulated from an actual data set. On these data, the SAM statistic and LIMMA F-test gave almost identical ordering of the genes. However, the FDR adjustment was too stringent for SAM (i.e. the true FDR was lower than SAM's estimate) and was too liberal for LIMMA. This was not a big study. I took my gene means and variances from an actual study, and then added either normal or t-4 errors and a couple of levels of differential expression. The sample sizes I used were very small - 2 or 4 replicates with 22000 genes. Results were much, much, much better with 4 replicates than with 2. --Naomi At 08:48 PM 3/28/2005, Wu, Xiwei wrote: >Hi, BioC Members, > >I have a general question on identifying DE genes. Since there are many >ways to do this, I am wondering whether people has compared methods >such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of >course, they have different assumptions and different models, but >should they always give similar results (assuming the parameter >settings are optimized to get similar number of DE genes)? Is it better >to get a common list of genes using three different methods? Do I have >more confidence on this common list of genes than using a single method? > >Xiwei > > >"EMF <coh.org>" made the following annotations. >--------------------------------------------------------------------- -- >------- SECURITY/CONFIDENTIALITY WARNING: This message and any >atta...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 [[alternative HTML version deleted]]

ADD COMMENT • link 19.7 years ago Wu, Xiwei ▴ 350

0

Entering edit mode

I used several levels of differential expression including 10% and 80%. --Naomi At 11:35 AM 3/29/2005, Wu, Xiwei wrote: >Thanks a lot, Naomi. Your result is very interesting. I am wondering >whether the number of DE genes in your simultion dataset will affect the >results. > >I tested SAM and Limma using the same dataset (but without the knowledge >of what genes should be DE). I know this is not the best way to compare >different methods, but I just want to get some idea. At the level of 0.05 >FDR, SAM finds a lot more DE genes than Limma. However, with some other >datasets, SAM and Limma perform similarly. In addition, I also found using >median FDR or mean FDR in SAM makes a big difference for some datasets, >but not for others. > >The message I got is that there is no common answer to this question, >because it depends on the datasets? Any comments? >In addition, is there a guideline for the minimum number of replicates >should be used with SAM? I assume that with small number of replicates, >the permutaion does not mean much. > >Xiwei > >-----Original Message----- >From: Naomi Altman [<mailto:naomi@stat.psu.edu>mailto:naomi@stat.psu.edu] >Sent: Monday, March 28, 2005 8:34 PM >To: Wu, Xiwei; bioconductor@stat.math.ethz.ch >Subject: Re: [BioC] SAM vs LIMMA vs EBAM > >I have not tried EBAM, but I did do this experiment with SAM and LIMMA on >a data set I simulated from an actual data set. > >On these data, the SAM statistic and LIMMA F-test gave almost identical >ordering of the genes. However, the FDR adjustment was too stringent for >SAM (i.e. the true FDR was lower than SAM's estimate) and was too liberal >for LIMMA. > >This was not a big study. I took my gene means and variances from an >actual study, and then added either normal or t-4 errors and a couple of >levels of differential expression. > >The sample sizes I used were very small - 2 or 4 replicates with 22000 >genes. Results were much, much, much better with 4 replicates than with 2. > >--Naomi > >At 08:48 PM 3/28/2005, Wu, Xiwei wrote: > >Hi, BioC Members, > > > >I have a general question on identifying DE genes. Since there are many > >ways to do this, I am wondering whether people has compared methods > >such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of > >course, they have different assumptions and different models, but > >should they always give similar results (assuming the parameter > >settings are optimized to get similar number of DE genes)? Is it better > >to get a common list of genes using three different methods? Do I have > >more confidence on this common list of genes than using a single method? > > > >Xiwei > > > > > >"EMF <coh.org>" made the following annotations. > >------------------------------------------------------------------- ---- > >------- SECURITY/CONFIDENTIALITY WARNING: This message and any > >atta...{{dropped}} > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > ><https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">https://stat.et hz.ch/ > mailman/listinfo/bioconductor > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 [[alternative HTML version deleted]]

ADD REPLY • link 19.7 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Wu, Xiwei ▴ 350

@wu-xiwei-1102

Last seen 10.2 years ago

With different levels of differential expression, the result should be very convincing. Your study provides a good guideline for using SAM and Limma. Now, I need to think twice before using any FDR cutoffs. Thanks again. Xiwei _____ From: Naomi Altman [mailto:naomi@stat.psu.edu] Sent: Tuesday, March 29, 2005 8:51 AM To: Wu, Xiwei; bioconductor@stat.math.ethz.ch Subject: RE: [BioC] SAM vs LIMMA vs EBAM I used several levels of differential expression including 10% and 80%. --Naomi At 11:35 AM 3/29/2005, Wu, Xiwei wrote: Thanks a lot, Naomi. Your result is very interesting. I am wondering whether the number of DE genes in your simultion dataset will affect the results. I tested SAM and Limma using the same dataset (but without the knowledge of what genes should be DE). I know this is not the best way to compare different methods, but I just want to get some idea. At the level of 0.05 FDR, SAM finds a lot more DE genes than Limma. However, with some other datasets, SAM and Limma perform similarly. In addition, I also found using median FDR or mean FDR in SAM makes a big difference for some datasets, but not for others. The message I got is that there is no common answer to this question, because it depends on the datasets? Any comments? In addition, is there a guideline for the minimum number of replicates should be used with SAM? I assume that with small number of replicates, the permutaion does not mean much. Xiwei -----Original Message----- From: Naomi Altman [mailto:naomi@stat.psu.edu <mailto:naomi@stat.psu.edu> ] Sent: Monday, March 28, 2005 8:34 PM To: Wu, Xiwei; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] SAM vs LIMMA vs EBAM I have not tried EBAM, but I did do this experiment with SAM and LIMMA on a data set I simulated from an actual data set. On these data, the SAM statistic and LIMMA F-test gave almost identical ordering of the genes. However, the FDR adjustment was too stringent for SAM (i.e. the true FDR was lower than SAM's estimate) and was too liberal for LIMMA. This was not a big study. I took my gene means and variances from an actual study, and then added either normal or t-4 errors and a couple of levels of differential expression. The sample sizes I used were very small - 2 or 4 replicates with 22000 genes. Results were much, much, much better with 4 replicates than with 2. --Naomi At 08:48 PM 3/28/2005, Wu, Xiwei wrote: >Hi, BioC Members, > >I have a general question on identifying DE genes. Since there are many >ways to do this, I am wondering whether people has compared methods >such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of >course, they have different assumptions and different models, but >should they always give similar results (assuming the parameter >settings are optimized to get similar number of DE genes)? Is it better >to get a common list of genes using three different methods? Do I have >more confidence on this common list of genes than using a single method? > >Xiwei > > >"EMF <coh.org>" made the following annotations. >--------------------------------------------------------------------- -- >------- SECURITY/CONFIDENTIALITY WARNING: This message and any >atta...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 [[alternative HTML version deleted]]

ADD COMMENT • link 19.7 years ago Wu, Xiwei ▴ 350

Login before adding your answer.