Question

Help with Limma F.p.value from eBayes

0

Entering edit mode

rs0412 • 0

@rs0412-20512

Last seen 6.0 years ago

Hello: I'd appreciate your help.

I have fit this model on microarray data (n genes = 15,700) using the "mgenes" eset

design = model.matrix(~group + age + gender + medUse)

age is continuous gender and medUse are binary group is categorical with different mouse strains. The normal mouse strain is set as the first condition in the group. We use several samples from the same mouse, so have set that as a block

corfit = duplicateCorrelation(mgenes, design, block=factor(mgenes$mouse_id))
fit = lmFit(mgenes,design,block=mgenes$mouse_id,correlation=corfit$consensus)
fit2 = eBayes(fit)

when I look at the F-statistics (fit2$F.p.value), the p-values are difficult to interpret: most are zeros (0.000000e+00) and the rest are very small (1.298202e-109 is the largest after 0)

My goal is to use the F-statistic p-value to reduce the number of genes that I test, to reduce the multiple testing burden, before looking at the moderated t-statistics and associated p-values.

Any help is greatly appreciated!

limma • 3.1k views

ADD COMMENT • link updated 6.0 years ago by Gordon Smyth 52k • written 6.0 years ago by rs0412 • 0

score 2 · Answer 1 · 2019-04-12

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 12 hours ago

The city by the bay

Use topTable to do your F-tests, don't try to poke around in the internals of the MArrayLM object unless you know what you're doing. The F.p.value field contains p-values for the F-test on all coefficients, including the intercept - this corresponds to the null hypothesis that all expression values are zero, and is unlikely to be sensible for single-channel arrays (and probably of limited utility for two-channel arrays).

Your motivation for using the F-test is also misguided. The F-test involving all terms is not uniformly more powerful than a t-test for a specific term, so your pre-filtering might end up throwing out genes that would otherwise be detected by using the t-test directly.

set.seed(42)
y <- matrix(rnorm(60000), ncol=10)
y[,1:2] <- y[,1:2] + 2 # DE genes in group 1.

library(limma)
design <- model.matrix(~gl(5,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)

res.F <- topTable(fit, n=Inf, sort.by="n")
res.t <- topTable(fit, coef=2, n=Inf, sort.by="n")

# Lots of genes lost by the F-test that are
# detected directly by the t-test.
table(SigF=res.F$adj.P.Val <= 0.05,
    SigT=res.t$adj.P.Val <= 0.05)

In addition, your second round of multiple testing correction fails to account for the fact that the gene list has effectively been pre-filtered based on the same data used in the t-test! This undermines the validity of the multiple testing, resulting in more false positives in the t-test DE gene list.

set.seed(1000)
y <- matrix(rnorm(60000), ncol=6)

# Adding lots of DE genes, but only in group 3.
N <- 2000
y[1:N,5:6] <- y[1:N,5:6] + 2

# Doing an F-test for any difference between groups.
library(limma)
design <- model.matrix(~gl(3,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")

# Subsetting and repeating for the 2 v 1 comparison
keep <- res.F$adj.P.Val <= 0.05
res.2 <- topTable(fit[keep,], coef=2, n=Inf, sort.by="n")

# Consistently non-zero detected features, despite the fact that
# the null is true for all 2 v 1 comparisons - these are false positives.
sum(res.2$adj.P.Val <= 0.05)

# Correct FDR adjustment should only yield non-zero values 5% of the time.
unfilt <- topTable(fit, coef=2, n=Inf, sort.by="n")
sum(unfilt$adj.P.Val <= 0.05)

I would suggest not trying to do something fancy, and just to do the test that corresponds directly to your scientific question. If you're interested in the pairwise comparisons - just do them.

ADD COMMENT • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you so much. This is incredibly helpful. I appreciate your time and expertise.

ADD REPLY • link 6.0 years ago rs0412 • 0

0

Entering edit mode

Dear Aaron, May I ask you to comment the two following simulations based on your simulation concerning F vs t test.

Here I increased the number of replicates from 2 to 4 and set the number of DE genes to 300. I don't feel any interest in estimating an effect with two samples. In that case, F is finding more genes than t.

set.seed(42)
replicates <- 4  # per group
y <- matrix(rnorm(30000*replicates), ncol=5*replicates)
dim(y)
#> [1] 6000   20
positives <- 300  # 5% of 6000 genes as DE
y[1:positives,1:replicates] <- y[1:positives,1:replicates] + 2 # DE genes in group 1.

library(limma)
design <- model.matrix(~gl(5, replicates))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")
#> Removing intercept from test coefficients
res.t <- topTable(fit, coef=2, n=Inf, sort.by="n")

# Overall results
table(SigF=res.F$adj.P.Val <= 0.05,
      SigT=res.t$adj.P.Val <= 0.05) 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE  5875   19
#>   TRUE     45   61

# Knowing the defined simulation
table(SigF=res.F$adj.P.Val <= 0.05,
      SigT=res.t$adj.P.Val <= 0.05,
      Positives=seq(nrow(y))<=300) 
#> , , Positives = FALSE
#> 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE  5691    1
#>   TRUE      6    2
#> 
#> , , Positives = TRUE
#> 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE   184   18
#>   TRUE     39   59

^{Created on 2019-04-14 by the reprex package (v0.2.1)}

Next, I tested the F test being biased by the intercept, as I interpreted your remark

The F.p.value field contains p-values for the F-test on all coefficients, including the intercept - this corresponds to the null hypothesis that all expression values are zero

set.seed(42)
replicates <- 4  # per group
y <- matrix(rnorm(30000*replicates), ncol=5*replicates)
dim(y)
#> [1] 6000   20
positives <- 300  # 5% of 6000 genes as DE
y[1:positives,1:replicates] <- y[1:positives,1:replicates] + 2 # DE genes in group 1.

# Make all expressions clearly different from zero
y <- y + 10


library(limma)
design <- model.matrix(~gl(5, replicates))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")
#> Removing intercept from test coefficients
res.t <- topTable(fit, coef=2, n=Inf, sort.by="n")

# Overall results
table(SigF=res.F$adj.P.Val <= 0.05,
      SigT=res.t$adj.P.Val <= 0.05) 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE  5875   19
#>   TRUE     45   61

# Knowing the defined simulation
table(SigF=res.F$adj.P.Val <= 0.05,
      SigT=res.t$adj.P.Val <= 0.05,
      Positives=seq(nrow(y))<=300) 
#> , , Positives = FALSE
#> 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE  5691    1
#>   TRUE      6    2
#> 
#> , , Positives = TRUE
#> 
#>        SigT
#> SigF    FALSE TRUE
#>   FALSE   184   18
#>   TRUE     39   59

^{Created on 2019-04-14 by the reprex package (v0.2.1)}

The table results are exactly the same. This sounds to be linked to the message Removing intercept from test coefficients.

Let me know.

ADD REPLY • link 6.0 years ago SamGG ▴ 360

0

Entering edit mode

In that case, F is finding more genes than t.

Sometimes it will, sometimes it won't. My point is that the F-test is not guaranteed to detect a superset of the genes detected by the t-test, which is also the case in your altered scenarios. This may discard genes that would otherwise have been detected by direct use of the t-test.

tl;dr Why do something that (i) requires more work, (ii) potentially reduces power, and (iii) potentially increases the false positive rate? If you care about pairwise comparisons, just do them.

The table results are exactly the same. This sounds to be linked to the message Removing intercept from test coefficients.

I don't know what you're getting at here. My remark was about the F.p.value field within the MArrayLM object returned by eBayes. This is quite different to the P.Values from topTable.

ADD REPLY • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your reply.

In my case, the "F-test before t-test" strategy comes from the ANOVA. Could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? Or is it a property of the linear model?

Concerning the F.p.value, I made a shortcut. My bad.

ADD REPLY • link 6.0 years ago SamGG ▴ 360

0

Entering edit mode

The "high-dimensional" aspect is irrelevant. limma fits gene-wise linear models, with no information shared between genes other than through empirical Bayes shrinkage (which has nothing to do with the current problem). You would see the same long-run behaviour with respect to loss of power and type I error control even if you only had one gene.

I don't know of any peer reviewed articles that address this problem, but I'd say that's because it's pretty self-evident. The null distribution of the t-statistic will change once you condition on the F-statistic (computed from the same data) being greater than a certain threshold value; all guarantees related to the validity of the subsequent t-tests are lost.

ADD REPLY • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

@SamGG, you haven't compared the F-test to the t-tests correctly. You used an F-test to compare all 5 groups but then conducted only one t-test instead of 4. If you make a fair comparison then the t-tests detect 169 genes and the F-test only 106:

> fit <- fit[,-1]
> results.t <- decideTests(fit, method="separate")
> results.F <- decideTests(fit, method="nestedF")
> sum(rowSums( abs(res.t) ) > 0)
[1] 169
> sum(rowSums( abs(res.F) ) > 0)
[1] 106

I have commented on this issue before, see for example https://support.bioconductor.org/p/16689 . The F-test is more power than the t-tests when there are several non-zero coefficients of similar sizes. The t-tests are more powerful when one or two coefficients are much larger than the others.

These principles do not depend on how many replicates you have or how large the intercept is, but the effect does get larger as the number of groups increases.

But power measured this way is not appropriate for the OP's experiment anyway. See my separate answer to the OP.

ADD REPLY • link 6.0 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for your reply and pointing the interesting post you wrote more than ten years ago (you are a library definitively).

As answered also to Aaron, could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? I think this could be valuable for the omics community.

ADD REPLY • link 6.0 years ago SamGG ▴ 360

score 1 · Answer 2 · 2019-04-14

The idea that you propose, using genewise F-tests as a screen before doing t-tests for each coefficient, is easily implemented in limma. In limma terminology, this procedure is called hierarchical multiple testing with the F-test as the genewise test. The code would be:

fit2 <- fit2[,-1]
results <- decideTests(fit2, method="hierarchical", genewise.p.value=fit2$F.p.value)

The first line of code removes the intercept from the model, yielding an F-test analogous to the standard overall model F-statistic in textbook regression theory.

However I strongly recommend against this approach for your experiment. In your case, you have several covariates that are not of primary interest (gender and age) and one or two variables of primary interest (either group or medUse, I can't tell which is of primary interest from your post). You should not be putting the nuisance covariates on the same level as the variable of primary interest, which is what the F-test does. You should not be highlighting genes with large age or gender effects. You instead should test for the coefficient of primary scientific interest on its own.

The F-test does not really reduce the multiple testing burden, because an F-test for multiple contrasts is like a form of multiple testing. The difference is that the F-test as more power to detect genes with significant signals for several coefficients at once while the t-tests have more power to detect genes with one big t-test. See Aaron's answer or an old post from me https://support.bioconductor.org/p/16689 for more discussion of this.