Use topTable
to do your F-tests, don't try to poke around in the internals of the MArrayLM
object unless you know what you're doing. The F.p.value
field contains p-values for the F-test on all coefficients, including the intercept - this corresponds to the null hypothesis that all expression values are zero, and is unlikely to be sensible for single-channel arrays (and probably of limited utility for two-channel arrays).
Your motivation for using the F-test is also misguided. The F-test involving all terms is not uniformly more powerful than a t-test for a specific term, so your pre-filtering might end up throwing out genes that would otherwise be detected by using the t-test directly.
set.seed(42)
y <- matrix(rnorm(60000), ncol=10)
y[,1:2] <- y[,1:2] + 2 # DE genes in group 1.
library(limma)
design <- model.matrix(~gl(5,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")
res.t <- topTable(fit, coef=2, n=Inf, sort.by="n")
# Lots of genes lost by the F-test that are
# detected directly by the t-test.
table(SigF=res.F$adj.P.Val <= 0.05,
SigT=res.t$adj.P.Val <= 0.05)
In addition, your second round of multiple testing correction fails to account for the fact that the gene list has effectively been pre-filtered based on the same data used in the t-test! This undermines the validity of the multiple testing, resulting in more false positives in the t-test DE gene list.
set.seed(1000)
y <- matrix(rnorm(60000), ncol=6)
# Adding lots of DE genes, but only in group 3.
N <- 2000
y[1:N,5:6] <- y[1:N,5:6] + 2
# Doing an F-test for any difference between groups.
library(limma)
design <- model.matrix(~gl(3,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")
# Subsetting and repeating for the 2 v 1 comparison
keep <- res.F$adj.P.Val <= 0.05
res.2 <- topTable(fit[keep,], coef=2, n=Inf, sort.by="n")
# Consistently non-zero detected features, despite the fact that
# the null is true for all 2 v 1 comparisons - these are false positives.
sum(res.2$adj.P.Val <= 0.05)
# Correct FDR adjustment should only yield non-zero values 5% of the time.
unfilt <- topTable(fit, coef=2, n=Inf, sort.by="n")
sum(unfilt$adj.P.Val <= 0.05)
I would suggest not trying to do something fancy, and just to do the test that corresponds directly to your scientific question. If you're interested in the pairwise comparisons - just do them.
Thank you so much. This is incredibly helpful. I appreciate your time and expertise.
Dear Aaron, May I ask you to comment the two following simulations based on your simulation concerning F vs t test.
Here I increased the number of replicates from 2 to 4 and set the number of DE genes to 300. I don't feel any interest in estimating an effect with two samples. In that case, F is finding more genes than t.
Created on 2019-04-14 by the reprex package (v0.2.1)
Next, I tested the F test being biased by the intercept, as I interpreted your remark
Created on 2019-04-14 by the reprex package (v0.2.1)
The table results are exactly the same. This sounds to be linked to the message
Removing intercept from test coefficients
.Let me know.
Sometimes it will, sometimes it won't. My point is that the F-test is not guaranteed to detect a superset of the genes detected by the t-test, which is also the case in your altered scenarios. This may discard genes that would otherwise have been detected by direct use of the t-test.
tl;dr Why do something that (i) requires more work, (ii) potentially reduces power, and (iii) potentially increases the false positive rate? If you care about pairwise comparisons, just do them.
I don't know what you're getting at here. My remark was about the
F.p.value
field within theMArrayLM
object returned byeBayes
. This is quite different to theP.Value
s fromtopTable
.Thanks for your reply.
In my case, the "F-test before t-test" strategy comes from the ANOVA. Could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? Or is it a property of the linear model?
Concerning the
F.p.value
, I made a shortcut. My bad.The "high-dimensional" aspect is irrelevant. limma fits gene-wise linear models, with no information shared between genes other than through empirical Bayes shrinkage (which has nothing to do with the current problem). You would see the same long-run behaviour with respect to loss of power and type I error control even if you only had one gene.
I don't know of any peer reviewed articles that address this problem, but I'd say that's because it's pretty self-evident. The null distribution of the t-statistic will change once you condition on the F-statistic (computed from the same data) being greater than a certain threshold value; all guarantees related to the validity of the subsequent t-tests are lost.
@SamGG, you haven't compared the F-test to the t-tests correctly. You used an F-test to compare all 5 groups but then conducted only one t-test instead of 4. If you make a fair comparison then the t-tests detect 169 genes and the F-test only 106:
I have commented on this issue before, see for example https://support.bioconductor.org/p/16689 . The F-test is more power than the t-tests when there are several non-zero coefficients of similar sizes. The t-tests are more powerful when one or two coefficients are much larger than the others.
These principles do not depend on how many replicates you have or how large the intercept is, but the effect does get larger as the number of groups increases.
But power measured this way is not appropriate for the OP's experiment anyway. See my separate answer to the OP.
Thanks for your reply and pointing the interesting post you wrote more than ten years ago (you are a library definitively).
As answered also to Aaron, could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? I think this could be valuable for the omics community.