Question

Limma multi-group analysis - General F-test

0

Entering edit mode

Miguel.Cosenza ▴ 10

@miguelcosenza-14267

Last seen 4.3 years ago

Germany / Freiburg / University Medical…

Dear all,

I am testing limma for multiple-group analysis. I have followed the recommendations in the user guide creating the design object and to establish the contrasts. In general, I would like to be able to run an ANOVA-like test, to look for proteins that are affected in general by the grouping variable and maybe do some pair-wise comparisons afterwards.

I understand that the topTableF function gives me this information by following the instruction in the user guide (page 43), but I am curious on why it is necessary to define the contrasts a priori after fitting the model if I want to check only for the general grouping effect. I am referring to this step in the analysis:

fit <-  lmFit(mat, design = design)
fitcontrasts <- contrasts.fit(fit,constrasts)

Based on this doubt, I have tested different settings for creating the design object and the fitting of the model with different results. I'll try to outline my tests here in the best possible way.

Experimental groups

groups <- c(rep("A",6), rep("B",5), rep("C",6))
groupf <- factor(groups, levels = c("A","B","C"))

Two types of design object

design <- model.matrix(~0+groupf) # without intercept
design2 <- model.matrix(~groupf) # with intercept
colnames(design) <- c("A","B","C")

Contrast matrix

constrasts <- makeContrasts(B-A, C-A, C-B,levels = design)

Different approaches for model fitting

1. Fit with intercept and no contrasts

fitintr <- lmFit(tomat, design = design2)
fitintr <- eBayes(fitintr)

2. Fit with no intercept and no contrasts

fit <- lmFit(tomat, design = design)
fitnocontr <- eBayes(fit)

3. Fit with no intercept and contrasts

fitcontrasts <- contrasts.fit(fit,constrasts)
fitcontrasts2 <- eBayes(fitcontrasts)

I notice that using topTableF on the outputs of the approaches 1 and 2 gives the same F and P-values. In my case, this approach gives me a result that I would interpret as '~90% proteins are affected by the grouping variable', since most proteins show a p-value lower than 0.01.

Does this means that establishing no contrasts is equivalent to fitting a model with the intercept? Is this a problem? Why?

The topTableF result from the approach 3 gives very different results. With a much smaller number (25) of proteins appearing affected by the grouping.

I would like to better understand these differences and why using one approach would be better than the other one.

Many thanks in advance for taking the time to read and for any comment you might offer.

limma experimental design • 5.4k views

ADD COMMENT • link updated 4.8 years ago by Gordon Smyth 52k • written 4.9 years ago by Miguel.Cosenza ▴ 10

score 3 · Accepted Answer · 2020-06-05

If you fit an intercept in the context of an ANOVA, and you accept the default treatments contrast parameterization, then one group will be set as the baseline, and that coefficient will estimate the mean of that baseline group. And all other coefficients will be contrasts between the given group and the baseline. So as an example, let's use example(lmFit), and to generate data that will be identical we should set the random seed first:

set.seed(0xabeef) ## ht Seth Falcon for hilarity
 sd <- 0.3*sqrt(4/rchisq(100,df=4))
 y <- matrix(rnorm(100*6,sd=sd),100,6)
rownames(y) <- paste("Gene",1:100)
y[1:2,4:6] <- y[1:2,4:6] + 2
design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
fit <- lmFit(y,design)
fit <- eBayes(fit)
topTableF(fit)
           Grp1 Grp2vs1 AveExpr     F  P.Value adj.P.Val
Gene 2   0.0789   1.953  1.0552 99.40 1.24e-06  9.66e-05
Gene 1  -0.0683   1.927  0.8950 89.12 1.93e-06  9.66e-05
Gene 59 -1.0265   1.400 -0.3265 10.00 5.81e-03  1.86e-01
Gene 22 -1.2720   1.850 -0.3468  7.83 1.17e-02  1.86e-01
Gene 77  0.3959  -1.092 -0.1499  7.77 1.20e-02  1.86e-01
Gene 78 -0.2367  -0.276 -0.3746  7.67 1.24e-02  1.86e-01
Gene 89  0.1559   0.441  0.3764  7.54 1.30e-02  1.86e-01
Gene 24  0.2020   0.137  0.2705  3.96 6.07e-02  7.49e-01
Gene 94  0.3335  -0.631  0.0182  3.76 6.74e-02  7.49e-01
Gene 32  0.4584  -0.881  0.0180  3.47 7.90e-02  7.90e-01

Do you see the problem here? The first coefficient is estimating the mean of Grp1, which we can actually test:

 head(rowMeans(y[,1:3]))
 Gene 1  Gene 2  Gene 3  Gene 4  Gene 5  Gene 6 
-0.0683  0.0789  0.0104  0.2782 -0.0860 -0.0489

And that isn't an interesting result! We are testing for evidence that any coefficient is > 0, and we expect most of the gene expression levels to be larger than zero, so you expect most genes will have a significant p-value. What you want to test is for evidence that there is one or more comparisons where the difference is larger than zero, and the best way to do that is to use a contrasts matrix, so all of the coefficients are contrasts rather than having one intercept term. Put a different way, only your method 3 is doing the right thing.

Although do note that MArrayLM objects have a [ function that does exactly what one would think it does, so if you fit a treatment contrasts model and the non-intercept terms are all fine for your purposes, you can still use that.

 topTableF(fit[,2])
        Grp2vs1 AveExpr     F  P.Value adj.P.Val
Gene 1    1.927  0.8950 95.66 6.39e-06  0.000377
Gene 2    1.953  1.0552 91.69 7.55e-06  0.000377
Gene 59   1.400 -0.3265 16.43 3.21e-03  0.107091
Gene 77  -1.092 -0.1499 14.45 4.65e-03  0.107091
Gene 22   1.850 -0.3468 13.74 5.35e-03  0.107091
Gene 94  -0.631  0.0182  7.50 2.41e-02  0.401008
Gene 32  -0.881  0.0180  6.93 2.85e-02  0.407168
Gene 96  -0.717  0.0280  5.83 4.04e-02  0.504519
Gene 7   -0.456 -0.1239  4.86 5.66e-02  0.629368
Gene 83  -0.465  0.0324  4.11 7.49e-02  0.739831

score 2 · Accepted Answer · 2020-06-05

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

It isn't actually necessary to define contrasts to get the correct F-test. The simplest possible code without any bells and whistles will give the right result:

fit <-  lmFit(tomat, design2)
fit <- eBayes(fit)
topTable(fit)

Nowadays I recommend that people just use topTable instead of topTableF since it doesn't everything the latter function does. I am planning to retire topTableF in a future version of limma.

Following up James' answer, the thing to remember is that topTableF tests whether the coefficients are equal to zero, not whether they are equal to each other. If you decide to fit a group-means model (the one without an intercept) then you do have to form contrasts, otherwise you're not dealing with differences between groups. Furthermore, if you have 3 groups and you want to conduct an F-test for differences, then you need to test for 2 coefficients not 3 because there are only 2 independent differences between 3 groups.

ADD COMMENT • link 4.8 years ago Gordon Smyth 52k

0

Entering edit mode

Many thanks for the input!

I have to say that I'm still confused about:

... if you have 3 groups and you want to conduct an F-test for differences, then you need to test for 2 coefficients not 3 because there are only 2 independent differences between 3 groups.

Does that mean that constrasts <- makeContrasts(B-A, C-A, C-B,levels = design) would be incorrect? How are B-A, C-A, C-B not independent differences contrasting each group against the other?

I hope this question is not completely silly.

ADD REPLY • link 4.8 years ago Miguel.Cosenza ▴ 10

1

Entering edit mode

The differences B-A, C-A, C-B are not independent because, if I tell you what the values are for any two of them, then you can deduce what the third must be. Eg. if B-A = 2.7 and C-A = 1.4, then it must necessarily follow that C-B = (C-A) - (B-A) = 2.7 - 1.4 = 1.3.

On the other hand, it is perfectly ok for you to ask limma to do an F-test for all three of them. limma will automatically figure out how many independent degrees of freedom there are. The resulting F-tests will in fact be the same whether you specify all three contrasts or if you specify just two of them (any two). You can actually specify any number of contrasts llike

constrasts <- makeContrasts(B-A, C-A, C-B, A-(B+C)/2, B-(A+C)/2, C-(A+B)/2,
                            levels = design)

or whatever and limma will still do an F-test on just 2 degrees of freedom.

ADD REPLY • link 4.8 years ago Gordon Smyth 52k

0

Entering edit mode

I understand.

Thanks again for taking the time to answer!

ADD REPLY • link 4.8 years ago Miguel.Cosenza ▴ 10