Question

Question re "Group-specific condition effects, individuals nested within groups"

0

Entering edit mode

fnigsch • 0

@fnigsch-18127

Last seen 6.3 years ago

Hi,

I am working with a dataset that has precisely the nested structure as exemplified in the section "Group-specific condition effects, individuals nested within groups" of the DESeq2 RNAseq tutorial: https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

The structure is the same, but I do have more groups, a variable number of individuals per group, and two conditions per invidividual. (I therefore removed all the columns from the design matrix with all zeros for coefficients that do not exist.)

Working through my data, I fit a model exactly as the one listed in the tutorial:

model.matrix(~ grp + grp:ind.n + grp:cnd, coldata)

I got the results for the contrasts that I am interested in via results(). When inspecting the results, and subsequently playing with some toy data, I got some doubts about the correctness of what I did.

My toy example:

Two groups (A, B), three individuals in each (1, 2, 3), two conditions per individual (x, y), two replicates per (group, individual, condition), values of condition y are ~2 higher than condition x:

df <- data.frame( grp=factor(rep(c("A", "B"), each=12)), cond=factor(rep(rep(c("x", "y"), each=2), 6)), ind=factor(rep(1:3, each=4)), value=rnorm(24, 10) + c(rnorm(12, -5), rnorm(12, 5)) + ifelse(df$cond == "x", 0, rnorm(24, 2, 0.5)) )

I then fit two linear models. Here the first one, plus the coefficients:

lm(value ~ grp + grp:ind + grp:cond, df)

Call: lm(formula = value ~ grp + grp:ind + grp:cond, data = df)

Coefficients: (Intercept) grpB grpA:ind2 grpB:ind2 grpA:ind3 grpB:ind3 grpA:condy grpB:condy 5.6597 9.6079 -0.1103 1.4499 0.3759 0.4865 2.1481 2.3591

And here the second model, excluding the grp:ind term:

lm(value ~ grp + grp:cond, df)

Call: lm(formula = value ~ grp + grp:cond, data = df)

Coefficients: (Intercept) grpB grpA:condy grpB:condy 5.748 10.165 2.148 2.359

My questions to the above:

1) What is the meaning of the intercept in either of these models?

2) The condition-specific effects per group (grpA:condy, grpB:condy) are the same for both models. If it was taken into account that there are paired samples (for each individual) then this should not be the case. What am I missing?

Any help greatly appreciated!

deseq2 linear model • 1.1k views

ADD COMMENT • link updated 6.3 years ago by James W. MacDonald 68k • written 6.3 years ago by fnigsch • 0

score 0 · Answer 1 · 2018-11-01

You can infer what the coefficients are, based on a bit of algebra. Note that the 0 and 1 entries in a model matrix indicate if a coefficient is estimated for a given sample or not. So for any set of rows that has only one 1, the coefficient for those rows is the mean of the group defined by the rows. Easier shown than described;

> mod1 <- model.matrix(~grp + grp:cond, df)
> mod2 <- model.matrix(~grp + grp:ind + grp:cond, df)
> mod1
   (Intercept) grpB grpA:condy grpB:condy
1            1    0          0          0
2            1    0          0          0
3            1    0          1          0
4            1    0          1          0
5            1    0          0          0
6            1    0          0          0
7            1    0          1          0
8            1    0          1          0
9            1    0          0          0
10           1    0          0          0
11           1    0          1          0
12           1    0          1          0
13           1    1          0          0
14           1    1          0          0
15           1    1          0          1
16           1    1          0          1
17           1    1          0          0
18           1    1          0          0
19           1    1          0          1
20           1    1          0          1
21           1    1          0          0
22           1    1          0          0
23           1    1          0          1
24           1    1          0          1

So if we extract the rows with just one 1 (these are the intercept-only rows), we get

> df[rowSums(mod1) == 1,]
   grp cond ind    value
1    A    x   1 5.422499
2    A    x   1 3.027077
5    A    x   2 3.360429
6    A    x   2 4.166690
9    A    x   3 4.230351
10   A    x   3 5.021791

And we can then infer that the intercept is the mean of the subjects from Group A, condition x, where we are computing the mean over all individuals. Figuring out the rest is simple algebra. For example, row 13 is an individual from Group B, condition x. If we substitute Grp_B_cond_x as shorthand, we have

Grp_B_cond_x = Grp_A_cond_x + X

Because that row has a coefficient for the intercept, and a coefficient for grpB. Solving for X gives us

X = Grp_B_cond_x - Grp_A_cond_x

So the 'grpB' coefficient is the difference between the Group B, condition x samples and the Group A, condition x samples. You can figure out the other two similarly (they are the difference between the Group A, condition y and Group A, condition x, and the same difference for the Group B samples).

You can figure out the coefficients for the larger model similarly (I leave that to the reader as an exercise), but what you will find is that for the larger model the grpA:condy and grpB:condy coefficients are computing the same thing as in the smaller model, which is why you get the same coefficient. But that's only half of the story! (Continued below)