Moderated F-test followed by moderated t-test in limma
1
0
Entering edit mode
@jurgenclaesen-24034
Last seen 4.4 years ago

Dear all,

I have a somewhat fundamental question about differential expression detection in limma. Assume, I want to analyze a micro-array experiment where gene-expression is measured for a limited set of genes (say 1000) in 8 different conditions. The main aim is to identify which of these genes are different for each pair-wise comparison (AvsB, AvsC, ..., AvsH, BvsC, et cetera).

In a "traditional" setting, one would apply first an F-test, and whenever this F-test has a pvalue below the significance level, one would start with the pair-wise comparisons. However, it seems that in limma, when using topTable() after eBayes(), that the moderated F-test and the moderated t-tests for the pair-wise comparisons are done at the same time, which means that the correction for multiple testing is done for all genes, regardless if the pvalue of the F-test is below the significance level, which could lead to having more false negatives.

Is it possible to mimick the "traditional" approach in limma, where the t-tests are done after selecting genes based on the F-test? Would this require to refit the model and hence apply the empirical Bayes approach on a smaller set of genes (which can lead to less precise estimation of the variances)?

Thank you, Jürgen

limma differential expression • 1.9k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Yes, a multiple gene version of the "traditional" F-test followed by t-tests approach (F-then-t) is implemented in limma in the "hierarchical" method of decideTests. No, it does not require any model refitting.

You should be aware though that the "traditional" F-then-t approach from early statistics textbooks doesn't have any theoretical advantages over more direct multiple testing approaches using t-tests alone. It does not generally improve statistical power even in the traditional univariate context and (in my experience) is only a minority method in the biomedical literature. The F-test is useful if the F-test null hypothesis is what you want to test but, if your ultimate intention is to control the error rate for the pairwise comparisons, then it doesn't help. The F-test also doesn't fit well with the newer concepts of FDR control because it is inherently controlling the FWER across the t-tests instead of the FDR.

There is no published theory for generalizing the F-then-t approach to multiple gene contexts in which multiple testing corrections have to be applied first to the F-tests and then to the t-tests (that's why the limma method is called "hierarchical"). Nor is there any theory for how to use F-tests for FDR control. For both of these reasons, the limma statistical method is novel and unpublished. You're welcome to use it, but I haven't found it to have any strong advantages so I don't recommend it or use it in the limma case studies.

ADD COMMENT
0
Entering edit mode

Thank you very much for the complete answer!

ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6