Entering edit mode
Remember that FDR is a rate - i.e. the expected false discovery
rate. If the set of genes is changeds, FDR will change because the
comparison set is different. This is NOT the same as a p-value,
which depends only on the value of the current test statistic.
The same thing happens with FWER, because these methods control the
probability of making at least one mistake, which clearly depends on
which set of tests are performed.
--Naomi
At 03:11 PM 12/13/2008, Sean Davis wrote:
>On Sat, Dec 13, 2008 at 12:36 PM, Wayne Xu <wxu at="" msi.umn.edu="">
wrote:
> > Hello,
> > I am not sure this is a right place to ask this question, but it
is about
> > micrarray data analysis:
> >
> > In two group t test, the multiple test Q values are depending on
the total
> > number of genes in the test. If I filter the gene list first, for
> example, I
> > only use those genes that have1.2 fold changes for T test and
> multiple test,
> > this gene list is much smaller than the total gene list, then the
multiple
> > test q values are much smaller.
> >
> > Do you think above is a correct way? People who do not do that way
may
> > consider the statistical power may be lost? But how much power
lost and how
> > to calculate the power in this case?
>
>No, you cannot filter based on fold change. However, you can filter
>based on variance or some other measure that does not depend on the
>two groups being compared. Anything that filters genes based on
>"knowing" the two groups will lead to a biased test. Remember that
>filtering removes genes from consideration from further analysis.
>
>For further details, there are MANY discussions of this topic in the
>mailing list.
>
> > When people report multiple test Q values, they usually do not
mention how
> > many genes are used in this multiple test. You can get different Q
values
> > (even use the same method, e.g. Benjamin and Holm adjust method)
> in the same
> > dataset. Then how can it make sense if the same genes have
different Q
> > values?
>
>A good manuscript should describe in detail the preprocessing and
>filtering steps, the statistical tests used, and the methods for
>correcting for multiple testing. You are correct that many papers do
>not do so.
>
>As for different q-values in the same dataset using different
methods,
>it is important to note that one should not do an analysis, get a
>result, and then, based on that result, go back and redo the analysis
>with different parameters to get a "better" result. It is very
>important that each step of an analysis (preprocessing, filtering,
>testing, multiple-testing correction) be justifiable independent of
>the other steps in order for the results to be interpretable.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111