In that formulation you can interpret the coefficients as
Intercept = Group A when OC == 0
oc = slope of OC for Group A
grB = difference in intercept between Group B and Group A
oc.grB = difference in slope between Group B and Group A
So to get the intercept for Group B and its slope you simply add the Intercept and grB, and to get the slope you add oc and oc.grB. Not sure why you would want the intercept, so let's concentrate on the slope.
## fake example
> library(limma)
> set.seed(0xabeef)
> z <- matrix(rnorm(1200), 100)
## try to force a significant interaction...
> z[1:10,7:9] <- z[1:10,7:9] - 2
> z[1:10,10:12] <- z[1:10,10:12] + 2
> grp <- gl(2,6)
> oc <- rnorm(12)
> design <- model.matrix(~oc*grp)
> fit <- lmFit(z, design)
> fitnocont <- eBayes(fit)
## top interacting 'gene' is #10
> topTable(fitnocont,4)
logFC AveExpr t P.Value adj.P.Val B
10 2.810583 -0.04859717 3.475649 0.001628594 0.1628594 -1.128217
7 3.439371 -0.09154699 3.154620 0.003731236 0.1865618 -1.842094
8 2.341007 0.02467877 2.991910 0.005618994 0.1872998 -2.192559
3 2.772142 -0.07539511 2.751199 0.010136632 0.2120847 -2.694125
2 2.502926 -0.06289789 2.732451 0.010604237 0.2120847 -2.732257
98 1.846969 -0.05407544 2.471004 0.019606723 0.3267787 -3.248230
24 1.564215 0.39737675 2.309226 0.028269238 0.3545001 -3.551400
47 -1.671360 0.56112280 -2.307785 0.028360006 0.3545001 -3.554040
69 1.604762 0.07991945 2.221184 0.034324866 0.3658172 -3.710700
89 -1.516999 -0.15290051 -2.153428 0.039753979 0.3658172 -3.830410
## now get the slope for grpB
> contrast <- matrix(c(0,1,0,1),4)
> fitcont <- contrasts.fit(fit, contrast)
> fitcont <- eBayes(fitcont)
## slope for gene 10, for Group B is 2.511
> topTable(fitcont)
logFC AveExpr t P.Value adj.P.Val B
8 2.501193 0.02467877 3.811288 0.0006680289 0.04461036 -0.3976625
10 2.511630 -0.04859717 3.703171 0.0008922073 0.04461036 -0.6622559
7 3.065967 -0.09154699 3.352852 0.0022431206 0.06660769 -1.5014681
1 2.241543 0.31163451 3.286267 0.0026643078 0.06660769 -1.6573120
6 2.229692 -0.25763843 2.410511 0.0225121215 0.35721935 -3.5549365
2 1.810080 -0.06289789 2.356030 0.0254603230 0.35721935 -3.6614464
5 1.809311 -0.44420628 2.312969 0.0280346610 0.35721935 -3.7444875
3 1.947435 -0.07539511 2.304349 0.0285775482 0.35721935 -3.7609879
76 1.220782 -0.18118689 2.131452 0.0416729645 0.45089658 -4.0829253
37 1.150990 0.01963060 2.049685 0.0495564211 0.45089658 -4.2289395
## now just do this by hand
## note that you can't actually fit ~oc + grpB
> summary(lm(z[10,]~oc , subset = grp == 2))
Call:
lm(formula = z[10, ] ~ oc, subset = grp == 2)
Residuals:
7 8 9 10 11 12
-1.7879 -0.3138 -0.7352 2.0230 -0.6280 1.4419
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7218 0.6584 -1.096 0.3345
oc 2.5116 1.0077 2.492 0.0673 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.613 on 4 degrees of freedom
Multiple R-squared: 0.6083, Adjusted R-squared: 0.5104
F-statistic: 6.212 on 1 and 4 DF, p-value: 0.06731
## same coefficient
Perfect! I know it is a different issue, but why does the within-group standard error and p-values differ when computed from the interaction model, as opposed to splitting the groups and run two separate model (as you showed in the answer)? I see that the estimate is the same. I am not afraid of the math, so just bring it on! :)
The standard error is the standard deviation of the residuals divided by the square root of the number of observations. If you cut your number of samples in half, you have fewer samples from which to estimate the standard deviation (which will change what you get), and you divide by a smaller number, so the estimated standard error will, all things equal, be larger. Since that's the denominator of your t-statistic, a larger denominator will give a smaller t-statistic when the numerator doesn't change.
In addition, the t-statistic will be based on half the degrees of freedom as well, so even if the t-statistic didn't change, the p-value will get larger. Plus, the degrees of freedom for the limma model fit will be larger than the
lm
fit because of theeBayes
step. See the limma User's Guide and the relevant papers referenced therein for more information.Absolutely wonderful! May I kindly ask also how to extract the results across both groups / the slope for oc across both groups?
I'm pretty sure I've already answered that question in my original answer.
My mistanke, my english is probably insufficient. The answer you wrote is correct. What I meant was a new question: how to get the slope for oc regarding all participants as one group, kinda like fitting ~oc+grp on the full data set.
By fitting ~oc+grp on the genes for which the interaction isn't significant.