Correct contrast for testing multi-level interaction effect in limma using combined factors design matrix
1
0
Entering edit mode
Jane Park • 0
@jane-park-12725
Last seen 7.2 years ago
Davis, CA, USA

Hi all,

I'm having a hard time figuring out where my contrast matrix has gone wrong. I'm interested in testing interactions for a multi-factorial RNA-seq project. The project is ultimately a 2x4x4 factorial design, but for the sake of simplicity and making sure I'm using the package appropriately, I've subsetted my data and am analyzing as a 2x4 factorial design. I'm interested in the interaction of parent treatment (group, levels: Control, Exposed) and development (stage, levels: 19, 28, 35, HA) on embryonic gene expression, and am trying to define a contrast that defines the interaction across all 4 stages of development.

> head(z$samples)

                                     files group lib.size norm.factors

0017ARSC00281 0017ARSC00281_htseq.count_v2     C  1501758    1.2739166

0018ARSC00351 0018ARSC00351_htseq.count_v2     C  1247792    1.0500738

0021ARSE00281 0021ARSE00281_htseq.count_v2     E  1041493    1.1739898

0022ARSE00351 0022ARSE00351_htseq.count_v2     E   846621    0.8519006

0025ARSC00191 0025ARSC00191_htseq.count_v2     C   582379    0.7811282

0026ARSE00191 0026ARSE00191_htseq.count_v2     E  1122251    1.0042548

              parenttreat stage

0017ARSC00281           C    28

0018ARSC00351           C    35

0021ARSE00281           E    28

0022ARSE00351           E    35

0025ARSC00191           C    19

0026ARSE00191           E    19

 

Based on the limma user guide and various posts on this forum, I've set up and compared two different design matrices: 1) Classic factorial design (y~factor1*factor2), and 2) analyzing as a single factor (y~combined). The number of DEG for the interaction is different between the two methods, and leads me to think that my contrasts are improperly defined. 

#Classic factorial design:

design1 <- model.matrix(~group*stage, data=z$samples)
colnames(design1)[1] <- "Intercept"
colnames(design1) <- gsub("group", "", colnames(design1))
colnames(design1) <- gsub(":", ".", colnames(design1))
colnames(design1) <- gsub("-", ".", colnames(design1))

 

> design1

              Intercept E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA

0017ARSC00281         1 0       1       0       0         0         0         0

0018ARSC00351         1 0       0       1       0         0         0         0

0021ARSE00281         1 1       1       0       0         1         0         0

0022ARSE00351         1 1       0       1       0         0         1         0

0025ARSC00191         1 0       0       0       0         0         0         0

0026ARSE00191         1 1       0       0       0         0         0         0

0037ARSC00282         1 0       1       0       0         0         0         0

0038ARSE00282         1 1       1       0       0         1         0         0

0041ARSC00192         1 0       0       0       0         0         0         0

0043ARSE00192         1 1       0       0       0         0         0         0

0053ARSC00352         1 0       0       1       0         0         0         0

0055ARSE00352         1 1       0       1       0         0         1         0

0057ARSC00193         1 0       0       0       0         0         0         0

0058ARSC00194         1 0       0       0       0         0         0         0

0059ARSE00193         1 1       0       0       0         0         0         0

0060ARSE00194         1 1       0       0       0         0         0         0

0061ARSC00283         1 0       1       0       0         0         0         0

0062ARSC00284         1 0       1       0       0         0         0         0

0063ARSE00283         1 1       1       0       0         1         0         0

0089ARSC00195         1 0       0       0       0         0         0         0

0090ARSE00195         1 1       0       0       0         0         0         0

0091ARSC00285         1 0       1       0       0         0         0         0

0092ARSE00285         1 1       1       0       0         1         0         0

0093ARSC00353         1 0       0       1       0         0         0         0

0094ARSC00354         1 0       0       1       0         0         0         0

0095ARSE00353         1 1       0       1       0         0         1         0

0096ARSE00354         1 1       0       1       0         0         1         0

0097ARSC00HA1         1 0       0       0       1         0         0         0

0098ARSC00HA2         1 0       0       0       1         0         0         0

0099ARSE00HA1         1 1       0       0       1         0         0         1

0100ARSE00HA2         1 1       0       0       1         0         0         1

0137ARSE00286         1 1       1       0       0         1         0         0

0138ARSC00355         1 0       0       1       0         0         0         0

0139ARSE00355         1 1       0       1       0         0         1         0

0140ARSC00HA3         1 0       0       0       1         0         0         0

0141ARSC00HA4         1 0       0       0       1         0         0         0

0142ARSC00HA5         1 0       0       0       1         0         0         0

0143ARSC00HA3         1 0       0       0       1         0         0         0

0144ARSE00HA4         1 1       0       0       1         0         0         1

0176ARSC00196         1 0       0       0       0         0         0         0

attr(,"assign")

[1] 0 1 2 2 2 3 3 3

attr(,"contrasts")

attr(,"contrasts")$group

[1] "contr.treatment"

attr(,"contrasts")$stage

[1] "contr.treatment"

zq1 <- voomWithQualityWeights(z, design1, plot=TRUE)
fit1 <- lmFit(zq1,design1)
fit1 <- eBayes(fit1)
plotSA(fit1, main="Final model: mean-variance trend")
results1 <- decideTests(fit1, adjust.method="BH")

> summary(results1)

Intercept     E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA

-1         0     2    1095    3359    2800         0        21        12

0       8762 23382   20023   13256   15012     23426     23403     23412

1      14664    42    2308    6811    5614         0         2         2

## pt_devixn
ixn1 <- contrasts.fit(fit1, coef=6:8)
ixn1 <- eBayes(ixn1)
ixn1results <- decideTests(ixn1)
summary(ixn1results)

  E.stage28 E.stage35 E.stageHA

-1         0        21        12

0      23426     23403     23412

1          0         2         2

 

## Analyzing as a single factor 

combi <- factor(paste(z$samples$group, z$samples$stage, sep="."))
cbind(z$samples, combi=combi)
design2 <- model.matrix(~combi, data=z$samples)
colnames(design2) <- levels(combi)
colnames(design2) <- gsub("combi", "", colnames(design2))
colnames(design2)[1] <- "Intercept"

> design2

              Intercept C.28 C.35 C.HA E.19 E.28 E.35 E.HA

0017ARSC00281         1    1    0    0    0    0    0    0

0018ARSC00351         1    0    1    0    0    0    0    0

0021ARSE00281         1    0    0    0    0    1    0    0

0022ARSE00351         1    0    0    0    0    0    1    0

0025ARSC00191         1    0    0    0    0    0    0    0

0026ARSE00191         1    0    0    0    1    0    0    0

0037ARSC00282         1    1    0    0    0    0    0    0

0038ARSE00282         1    0    0    0    0    1    0    0

0041ARSC00192         1    0    0    0    0    0    0    0

0043ARSE00192         1    0    0    0    1    0    0    0

0053ARSC00352         1    0    1    0    0    0    0    0

0055ARSE00352         1    0    0    0    0    0    1    0

0057ARSC00193         1    0    0    0    0    0    0    0

0058ARSC00194         1    0    0    0    0    0    0    0

0059ARSE00193         1    0    0    0    1    0    0    0

0060ARSE00194         1    0    0    0    1    0    0    0

0061ARSC00283         1    1    0    0    0    0    0    0

0062ARSC00284         1    1    0    0    0    0    0    0

0063ARSE00283         1    0    0    0    0    1    0    0

0089ARSC00195         1    0    0    0    0    0    0    0

0090ARSE00195         1    0    0    0    1    0    0    0

0091ARSC00285         1    1    0    0    0    0    0    0

0092ARSE00285         1    0    0    0    0    1    0    0

0093ARSC00353         1    0    1    0    0    0    0    0

0094ARSC00354         1    0    1    0    0    0    0    0

0095ARSE00353         1    0    0    0    0    0    1    0

0096ARSE00354         1    0    0    0    0    0    1    0

0097ARSC00HA1         1    0    0    1    0    0    0    0

0098ARSC00HA2         1    0    0    1    0    0    0    0

0099ARSE00HA1         1    0    0    0    0    0    0    1

0100ARSE00HA2         1    0    0    0    0    0    0    1

0137ARSE00286         1    0    0    0    0    1    0    0

0138ARSC00355         1    0    1    0    0    0    0    0

0139ARSE00355         1    0    0    0    0    0    1    0

0140ARSC00HA3         1    0    0    1    0    0    0    0

0141ARSC00HA4         1    0    0    1    0    0    0    0

0142ARSC00HA5         1    0    0    1    0    0    0    0

0143ARSC00HA3         1    0    0    1    0    0    0    0

0144ARSE00HA4         1    0    0    0    0    0    0    1

0176ARSC00196         1    0    0    0    0    0    0    0

attr(,"assign")

[1] 0 1 1 1 1 1 1 1

attr(,"contrasts")

attr(,"contrasts")$combi

[1] "contr.treatment"


zq2 <- voomWithQualityWeights(z, design2, plot=FALSE)
fit2 <- lmFit(zq2,design2)
fit2 <- eBayes(fit2)
results2 <- decideTests(fit2, adjust.method="BH")
summary(results2)

   Intercept  C.28  C.35  C.HA  E.19  E.28  E.35  E.HA

-1         0  1095  3359  2800     2  1058  2738  2160

0       8762 20023 13256 15012 23382 20109 15744 16948

1      14664  2308  6811  5614    42  2259  4944  4318
ixn.con <- makeContrasts(ixn=(Intercept-C.28-C.35-C.HA)-(E.19-E.28-E.35-E.HA), levels=design2)
ixn2 <- contrasts.fit(fit2, ixn.con)
ixn2 <- eBayes(ixn2)
ixn2results <- decideTests(ixn2)
summary(ixn2results)

     ixn

-1    90

0  15483

1   7853

Model 1 and its contrast outputs ~25 DEG for the interaction, which makes more biological sense to me than the ~8000 DEG from Model 2. I realize that's not the best barometer for assessing the output, so some guidance on how to best understand these coefficients are also appreciated. Most importantly, I'd like to figure out the appropriate contrasts for Model 2 because it'll be much easier to interpret coefficients when adding the third factor of interest. 

I apologize in advance if I've missed something obvious in the manual or from the forum posts. Any feedback on where I'm going wrong is greatly appreciated. 

Best,

Jane

limma contrast matrix limma design matrix limma rna-seq limma voom • 2.4k views
ADD COMMENT
4
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

The contrast for your second approach is wrong. It should be:

con <- makeContrasts(
    (E.28 - C.28) - E.19, # equivalent to coef=6 in first design
    (E.35 - C.35) - E.19, # equivalent to coef=7
    (E.HA - C.HA) - E.19, # equivalent to coef=8
    levels=design)

In your second approach, every coefficient other than the intercept represents the log-fold change of the corresponding group relative to the control-19 group. The (E.X - C.X) represents the log-fold change due to exposure at stage X, while E.19 represents the log-fold change due to exposure at stage 19. The difference between these two log-fold changes is the interaction term, which should be zero if the stage/exposure effects are purely additive.

I should also add that decideTests will report separate up/down statistics for each coefficient or column of the contrast matrix. This is not the same as the number of DE genes from dropping all coefficients at once, e.g., by running topTable with coef=NULL.

ADD COMMENT
0
Entering edit mode

Thanks for the clarification, Aaron. It looks like the separate up/down statistics from decideTests look slightly different between the two approaches, as well as the results of topTable. Why do these differences appear between the two approaches? Thanks again for the input!

ixn1 <- contrasts.fit(fit1, coef=6:8)
ixn1 <- eBayes(ixn1)
ixn1results <- decideTests(ixn1)

con <- makeContrasts(

    (E.28 - C.28) - E.19, # equivalent to coef=6 in first design
    (E.35 - C.35) - E.19, # equivalent to coef=7
    (E.HA - C.HA) - E.19, # equivalent to coef=8
    levels=design2)
ixn2 <- contrasts.fit(fit2, con)
ixn2 <- eBayes(ixn2)

ixn2results <- decideTests(ixn2)


> summary(ixn1results)

   E.stage28 E.stage35 E.stageHA

-1         0        21        12

0      23426     23403     23412

1          0         2         2

> summary(ixn2results)

   (E.28 - C.28) - E.19 (E.35 - C.35) - E.19 (E.HA - C.HA) - E.19

-1                    0                    7                   11

0                 23426                23417                23413

1                     0                    2                    2

> topTable(ixn1, coef=NULL, p.value=0.05, number=Inf)

           E.stage28 E.stage35 E.stageHA  AveExpr         F      P.Value

105927004 -1.3874870 -4.719365 -5.437884 2.360316 14.592496 3.218448e-07

105917938 -3.8249407 -3.825325 -3.949919 4.545534 13.454110 8.528391e-07

105917629 -0.3984093  4.076354  3.762134 2.662013 13.200461 1.064246e-06

105918650 -1.0450096 -1.816854 -5.731129 2.867703 11.790502 3.756017e-06

105933641 -2.0301397 -4.489063 -5.790309 2.764337 10.778847 9.594163e-06

105933824 -2.1884430 -3.473024 -6.345559 2.939039 10.698377 1.034997e-05

105934711 -2.3767461 -4.053166 -5.559814 3.600467 10.220027 1.630708e-05

105931270 -2.8159178 -3.998344 -5.300175 3.714562 10.110305 1.811641e-05

105918686  0.4689484  1.770305 -3.460255 3.649248 10.087677 1.851463e-05

105935213 -2.2799742 -4.038241 -5.443232 3.753408  9.972584 2.068461e-05

            adj.P.Val

105927004 0.007539537

105917938 0.008310339

105917629 0.008310339

105918650 0.021997115

105933641 0.040409714

105933824 0.040409714

105934711 0.048191528

105931270 0.048191528

105918686 0.048191528

105935213 0.048455768

> topTable(ixn2, coef=NULL, p.value=0.05, number=Inf)

          X.E.28...C.28....E.19 X.E.35...C.35....E.19 X.E.HA...C.HA....E.19

105927004            -1.3874870             -4.719365            -5.4378843

105917629            -0.3984093              4.076354             3.7621337

105918686             0.4689484              1.770305            -3.4602551

105918650            -1.0450096             -1.816854            -5.7311285

105940307            -0.9256202             -3.509328             0.2412498

105933824            -2.1884430             -3.473024            -6.3455592

105933641            -2.0301397             -4.489063            -5.7903094

105938640            -0.7989726             -1.978063            -6.0888987

105917938            -3.8249407             -3.825325            -3.9499190

           AveExpr        F      P.Value   adj.P.Val

105927004 2.360316 14.66891 3.018007e-07 0.006075775

105917629 2.662013 14.03037 5.187207e-07 0.006075775

105918686 3.649248 12.28329 2.402906e-06 0.018763494

105918650 2.867703 11.73154 3.963942e-06 0.023214827

105940307 3.731491 11.46705 5.053676e-06 0.023677483

105933824 2.939039 10.73695 9.980305e-06 0.035137343

105933641 2.764337 10.68318 1.049950e-05 0.035137343

105938640 4.370644 10.43982 1.322212e-05 0.038717675

105917938 4.545534 10.20284 1.657764e-05 0.043149766
ADD REPLY
1
Entering edit mode

This seems to be caused by a rather esoteric problem with the approximation of weights in contrasts.fit, see the documentation of that function for more details. Usually the approximation is good enough that you don't see any differences in the DE results with the different parametrizations. This isn't the case here, but even so, the differences in the number of DE genes (and probably their p-values) is probably small enough to be ignored.

ADD REPLY

Login before adding your answer.

Traffic: 866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6