The Problem

I have a bulk-RNA project in which samples have a multitude of phenotypes and features. In order to maximize meaningful comparison, I aggregate them as depicted in chapter 5.4 of "A guide to creating design matrices for gene expression experiments". During my analysis, I tried to remove one or more of these features, and it affected the number of DEGs greatly.

The Setup

To test this, I've generated a random variable stub feature, distributed unevenly (75% is "RIGHT", 25% is LEFT), and tried to run my voom-duplicatecorrelation-limma pipeline two times:

Without the stub:

The formula is "~0 + Feature + batch + Age + Sex"
Feature is a factor with levels are A, B, C
Groups sizes are A=115, B=114, C=22
Contrasts are (A-C), (A-B), (B-C)

Looking at the contrast table, for each column there is one negative contrast of (-1), one positive of (+1), and the sum of each column is 0.

   Feature.B.vs.C         Feature.A.vs.C           Feature.A.vs.B
Feature.C    -1                          -1                         0
Feature.B     1                           0                        -1
Feature.A     0                           1                         1
batchTRUE     0                           0                         0
Age           0                           0                         0
SexM          0                           0                         0

With the stub:

The formula is "~0 + Feature.Stub + batch + Age + Sex"
Feature.Stub is a factor with levels A.LEFT, B.LEFT, C.LEFT, A.RIGHT, B.RIGHT, C.RIGHT.
Groups sizes are
- A.RIGHT=94, A.LEFT =21, A.RIGHT is %82 of A, total is 115 (same).
- B.RIGHT=77, B.LEFT=37, B.RIGHT is %67 of B, total is 114 (same).
- C.RIGHT=17, C.LEFT=5, C.RIGHT is %63 of C, total is 22 (same).
Contrasts are
- (A.RIGHT+A.LEFT)/2 - (C.RIGHT + C.LEFT)/2
- (A.RIGHT+A.LEFT)/2 - (B.RIGHT + B.LEFT)/2
- (B.RIGHT+B.LEFT)/2 - (C.RIGHT + C.LEFT)/2

Looking at the contrast table, for each column, there are two negative contrasts of (-0.5), two positives of (+0.5), and the sum of each column is also 0.

          Feature.A.vs.C.RIGHTandLEFT      Feature.A.vs.B.RIGHTandLEFT      Feature.B.vs.C.RIGHTandLEFT
Feature.StubC.RIGHT                 -0.5                  0.0                      -0.5
Feature.StubB.RIGHT                  0.0                 -0.5                       0.5
Feature.StubA.RIGHT                  0.5                  0.5                       0.0
Feature.StubC.LEFT                  -0.5                  0.0                      -0.5
Feature.StubB.LEFT                   0.0                 -0.5                       0.5
Feature.StubA.LEFT                   0.5                  0.5                       0.0
batchTRUE                            0.0                  0.0                       0.0
Age                                  0.0                  0.0                       0.0
SexM                                 0.0                  0.0                       0.0

Results

The results, however, differ:

#DEGs Without stub:

Comparison        Up      Down
=========         ==      ====
Feature.A.vs.C    2094    3512
Feature.A.vs.B    2103    3244
Feature.B.vs.C    576      937

#DEGs With stub:

Comparison                     Up      Down
=========                      ==      ====
Feature.A.vs.C.RIGHTandLEFT    1951    3023
Feature.A.vs.B.RIGHTandLEFT    1251    2651
Feature.B.vs.C.RIGHTandLEFT    460     517

I did expect some differences - but not this this extent, and I would love to hear your thoughts. Thanks!

RNASeq limma • 1.1k views

ADD COMMENT • link updated 21 months ago by Gordon Smyth 52k • written 21 months ago by Jonathan ▴ 10

score 2 · Accepted Answer · 2023-07-25

If you change the number of groups or factors in the linear model, then the results will change. That is a fundamental property of linear models and anova, not just limma.

When you add in more groups, then the number of samples being averaged to estimate each coefficient decreases, which increases the standard errors and decreases statistical power to detect DE. The residual degrees of freedom for the linear model also decrease, which also tends to decrease statistical power to detect DE, because you are conducting t-tests on fewer degrees of freedom. On the other hand, if there are genuine differences between the new groups or subgroups, then the residual mean square (sigma) will decrease, which decreases the standard errors and increases statistical power to detect DE. So the number of DE genes for equivalent contrasts can increase or decrease.

The results that you show appear completely unsurprising in a context where there is actually no genuine DE between LEFT and RIGHT. Your minimum sample size has decreased from n=22 to n=5, which is not a small change. If you plot the t-tests from the models with and without stub against one another, you will find that the results for the two model are highly correlated with one another, but small changes in the t-statistics and p-values can produce relatively larger changes in the number of DE genes if there are many genes close to the 0.05 FDR cutoff.