Hi, BioC Members,
I have a general question on identifying DE genes. Since there are
many ways
to do this, I am wondering whether people has compared methods such as
SAM,
EBAM, and LIMMA by applying them to the same dataset. Of course, they
have
different assumptions and different models, but should they always
give
similar results (assuming the parameter settings are optimized to get
similar number of DE genes)? Is it better to get a common list of
genes
using three different methods? Do I have more confidence on this
common list
of genes than using a single method?
Xiwei
"EMF <coh.org>" made the following annotations.
----------------------------------------------------------------------
--------
SECURITY/CONFIDENTIALITY WARNING: This message and any
atta...{{dropped}}
I have not tried EBAM, but I did do this experiment with SAM and LIMMA
on a
data set I simulated from an actual data set.
On these data, the SAM statistic and LIMMA F-test gave almost
identical
ordering of the genes. However, the FDR adjustment was too stringent
for
SAM (i.e. the true FDR was lower than SAM's estimate) and was too
liberal
for LIMMA.
This was not a big study. I took my gene means and variances from an
actual study, and then added either normal or t-4 errors and a couple
of
levels of differential expression.
The sample sizes I used were very small - 2 or 4 replicates with 22000
genes. Results were much, much, much better with 4 replicates than
with 2.
--Naomi
At 08:48 PM 3/28/2005, Wu, Xiwei wrote:
>Hi, BioC Members,
>
>I have a general question on identifying DE genes. Since there are
many ways
>to do this, I am wondering whether people has compared methods such
as SAM,
>EBAM, and LIMMA by applying them to the same dataset. Of course, they
have
>different assumptions and different models, but should they always
give
>similar results (assuming the parameter settings are optimized to get
>similar number of DE genes)? Is it better to get a common list of
genes
>using three different methods? Do I have more confidence on this
common list
>of genes than using a single method?
>
>Xiwei
>
>
>"EMF <coh.org>" made the following annotations.
>---------------------------------------------------------------------
---------
>SECURITY/CONFIDENTIALITY WARNING: This message and any
atta...{{dropped}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Thanks a lot, Naomi. Your result is very interesting. I am wondering
whether
the number of DE genes in your simultion dataset will affect the
results.
I tested SAM and Limma using the same dataset (but without the
knowledge of
what genes should be DE). I know this is not the best way to compare
different methods, but I just want to get some idea. At the level of
0.05
FDR, SAM finds a lot more DE genes than Limma. However, with some
other
datasets, SAM and Limma perform similarly. In addition, I also found
using
median FDR or mean FDR in SAM makes a big difference for some
datasets, but
not for others.
The message I got is that there is no common answer to this question,
because it depends on the datasets? Any comments?
In addition, is there a guideline for the minimum number of replicates
should be used with SAM? I assume that with small number of
replicates, the
permutaion does not mean much.
Xiwei
-----Original Message-----
From: Naomi Altman [mailto:naomi@stat.psu.edu]
Sent: Monday, March 28, 2005 8:34 PM
To: Wu, Xiwei; bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] SAM vs LIMMA vs EBAM
I have not tried EBAM, but I did do this experiment with SAM and LIMMA
on a
data set I simulated from an actual data set.
On these data, the SAM statistic and LIMMA F-test gave almost
identical
ordering of the genes. However, the FDR adjustment was too stringent
for
SAM (i.e. the true FDR was lower than SAM's estimate) and was too
liberal
for LIMMA.
This was not a big study. I took my gene means and variances from an
actual
study, and then added either normal or t-4 errors and a couple of
levels of
differential expression.
The sample sizes I used were very small - 2 or 4 replicates with 22000
genes. Results were much, much, much better with 4 replicates than
with 2.
--Naomi
At 08:48 PM 3/28/2005, Wu, Xiwei wrote:
>Hi, BioC Members,
>
>I have a general question on identifying DE genes. Since there are
many
>ways to do this, I am wondering whether people has compared methods
>such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of
>course, they have different assumptions and different models, but
>should they always give similar results (assuming the parameter
>settings are optimized to get similar number of DE genes)? Is it
better
>to get a common list of genes using three different methods? Do I
have
>more confidence on this common list of genes than using a single
method?
>
>Xiwei
>
>
>"EMF <coh.org>" made the following annotations.
>---------------------------------------------------------------------
--
>------- SECURITY/CONFIDENTIALITY WARNING: This message and any
>atta...{{dropped}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
[[alternative HTML version deleted]]
I used several levels of differential expression including 10% and
80%.
--Naomi
At 11:35 AM 3/29/2005, Wu, Xiwei wrote:
>Thanks a lot, Naomi. Your result is very interesting. I am wondering
>whether the number of DE genes in your simultion dataset will affect
the
>results.
>
>I tested SAM and Limma using the same dataset (but without the
knowledge
>of what genes should be DE). I know this is not the best way to
compare
>different methods, but I just want to get some idea. At the level of
0.05
>FDR, SAM finds a lot more DE genes than Limma. However, with some
other
>datasets, SAM and Limma perform similarly. In addition, I also found
using
>median FDR or mean FDR in SAM makes a big difference for some
datasets,
>but not for others.
>
>The message I got is that there is no common answer to this question,
>because it depends on the datasets? Any comments?
>In addition, is there a guideline for the minimum number of
replicates
>should be used with SAM? I assume that with small number of
replicates,
>the permutaion does not mean much.
>
>Xiwei
>
>-----Original Message-----
>From: Naomi Altman
[<mailto:naomi@stat.psu.edu>mailto:naomi@stat.psu.edu]
>Sent: Monday, March 28, 2005 8:34 PM
>To: Wu, Xiwei; bioconductor@stat.math.ethz.ch
>Subject: Re: [BioC] SAM vs LIMMA vs EBAM
>
>I have not tried EBAM, but I did do this experiment with SAM and
LIMMA on
>a data set I simulated from an actual data set.
>
>On these data, the SAM statistic and LIMMA F-test gave almost
identical
>ordering of the genes. However, the FDR adjustment was too stringent
for
>SAM (i.e. the true FDR was lower than SAM's estimate) and was too
liberal
>for LIMMA.
>
>This was not a big study. I took my gene means and variances from an
>actual study, and then added either normal or t-4 errors and a couple
of
>levels of differential expression.
>
>The sample sizes I used were very small - 2 or 4 replicates with
22000
>genes. Results were much, much, much better with 4 replicates than
with 2.
>
>--Naomi
>
>At 08:48 PM 3/28/2005, Wu, Xiwei wrote:
> >Hi, BioC Members,
> >
> >I have a general question on identifying DE genes. Since there are
many
> >ways to do this, I am wondering whether people has compared methods
> >such as SAM, EBAM, and LIMMA by applying them to the same dataset.
Of
> >course, they have different assumptions and different models, but
> >should they always give similar results (assuming the parameter
> >settings are optimized to get similar number of DE genes)? Is it
better
> >to get a common list of genes using three different methods? Do I
have
> >more confidence on this common list of genes than using a single
method?
> >
> >Xiwei
> >
> >
> >"EMF <coh.org>" made the following annotations.
> >-------------------------------------------------------------------
----
> >------- SECURITY/CONFIDENTIALITY WARNING: This message and any
> >atta...{{dropped}}
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor@stat.math.ethz.ch
> ><https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">https://stat.et
hz.ch/
> mailman/listinfo/bioconductor
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
[[alternative HTML version deleted]]
With different levels of differential expression, the result should be
very
convincing. Your study provides a good guideline for using SAM and
Limma.
Now, I need to think twice before using any FDR cutoffs. Thanks again.
Xiwei
_____
From: Naomi Altman [mailto:naomi@stat.psu.edu]
Sent: Tuesday, March 29, 2005 8:51 AM
To: Wu, Xiwei; bioconductor@stat.math.ethz.ch
Subject: RE: [BioC] SAM vs LIMMA vs EBAM
I used several levels of differential expression including 10% and
80%.
--Naomi
At 11:35 AM 3/29/2005, Wu, Xiwei wrote:
Thanks a lot, Naomi. Your result is very interesting. I am wondering
whether
the number of DE genes in your simultion dataset will affect the
results.
I tested SAM and Limma using the same dataset (but without the
knowledge of
what genes should be DE). I know this is not the best way to compare
different methods, but I just want to get some idea. At the level of
0.05
FDR, SAM finds a lot more DE genes than Limma. However, with some
other
datasets, SAM and Limma perform similarly. In addition, I also found
using
median FDR or mean FDR in SAM makes a big difference for some
datasets, but
not for others.
The message I got is that there is no common answer to this question,
because it depends on the datasets? Any comments?
In addition, is there a guideline for the minimum number of replicates
should be used with SAM? I assume that with small number of
replicates, the
permutaion does not mean much.
Xiwei
-----Original Message-----
From: Naomi Altman [mailto:naomi@stat.psu.edu
<mailto:naomi@stat.psu.edu> ]
Sent: Monday, March 28, 2005 8:34 PM
To: Wu, Xiwei; bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] SAM vs LIMMA vs EBAM
I have not tried EBAM, but I did do this experiment with SAM and LIMMA
on a
data set I simulated from an actual data set.
On these data, the SAM statistic and LIMMA F-test gave almost
identical
ordering of the genes. However, the FDR adjustment was too stringent
for
SAM (i.e. the true FDR was lower than SAM's estimate) and was too
liberal
for LIMMA.
This was not a big study. I took my gene means and variances from an
actual
study, and then added either normal or t-4 errors and a couple of
levels of
differential expression.
The sample sizes I used were very small - 2 or 4 replicates with 22000
genes. Results were much, much, much better with 4 replicates than
with 2.
--Naomi
At 08:48 PM 3/28/2005, Wu, Xiwei wrote:
>Hi, BioC Members,
>
>I have a general question on identifying DE genes. Since there are
many
>ways to do this, I am wondering whether people has compared methods
>such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of
>course, they have different assumptions and different models, but
>should they always give similar results (assuming the parameter
>settings are optimized to get similar number of DE genes)? Is it
better
>to get a common list of genes using three different methods? Do I
have
>more confidence on this common list of genes than using a single
method?
>
>Xiwei
>
>
>"EMF <coh.org>" made the following annotations.
>---------------------------------------------------------------------
--
>------- SECURITY/CONFIDENTIALITY WARNING: This message and any
>atta...{{dropped}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
[[alternative HTML version deleted]]