I have a question related to example 9.6.2 in the Limma User's Guide. The example describes how to find genes whose behavior over time is different between two conditions, where each condition has a decent number of time-points.
Reproduced here:
X <- splines::ns(targets$Time, df=5)
Group <- factor(targets$Group)
design <- model.matrix(~Group*X)
fit <- lmFit(y, design)
fit <- eBayes(fit)
topTable(fit, coef=8:12)
Columns 8:12 of the design matrix correspond to the interaction terms between group and time.
What if I also want to find the genes whose average expression differs between conditions, adjusting for time? Is there a way to do this using the above design matrix? Currently I'm proceeding as follows:
X <- splines::ns(targets$Time, df=5)
Group <- factor(targets$Group)
design <- model.matrix(~Group + X)
fit <- lmFit(y, design)
fit <- eBayes(fit)
topTable(fit, coef=2)
Column 2 of the design matrix corresponds to group. This method appears to give sensible results, but any advice would be appreciated. Thanks.
Jake
Thanks for the quick and detailed response! A few questions:
1. I still care about distinguishing (1) genes for which the difference in expression between conditions is non-zero and constant over time from (2) genes for which the difference in expression between conditions changes as a function of time. Would it be reasonable to compare, for each gene, the p-values that result from coef(2, 8:12) and from coef(8:12)?
2. In the spline scenario here, how does one extract the genes that show a change over time in one particular group? I've looked through the user's guide and support forum, but have found seemingly contradictory information.
3. I don't completely understand how, as you say, coef=2 will test for differences in expression between groups at the first time point. Let's say my experiment doesn't have discrete time-points, just values of time that are randomly distributed between 0 and 10 and are different between groups. What happens then?
coef=8:12
in the interaction model is the way to go.coef=3:7
. To do the same for the second group, you would need to set up an ANOVA to test whether coefficient 3 plus coefficient 8 is equal to zero and coefficients 4 plus 9 are equal to zero and coefficients 5 plus 10 are equal to zero, etc. viamakeContrasts
.Sorry, I'm still confused. For detecting genes for scenario 1, I thought "model.matrix(~Group + X)" was an additive model, but you're saying coef=2 only looks at the first time-point. Do I need to use
makeContrasts
to look at the effect of group at each time-point? For randomly distributed time-points, it seems like this could get messy.In the additive model, looking at a shift in expression between groups at the first time point is the same as looking at a shift at any time point, because the time effect is the same between groups. So if there's a log-fold change of 2 at the first time point, there will also be a log-fold change of 2 at any other time point. I mention the first time point as this is the literal interpretation of what the coefficient represents (and thus, what the test is actually doing).
But limma does use all the samples from all the time-points in order to make that estimate, right? As part of the "extrapolation" you mentioned? I'm just trying to wrap my head around this.
Estimate... of what? Of the second coefficient, i.e., the log-fold change between groups? If so, yes, the fit will use information across all samples at all time points to estimate the coefficient. However, the most informative samples will be those in the time intervals that have samples from both groups.
Yes, that's what I meant, thanks.