I have been reviewing the R/limma manual (esp. sections 9.6.1 and 9.6.2), and need conceptual help in applying this.
In brief, I have a time serise with 12 Times(unevenly spaced), and for my Treatment I have 6 replicate Control individuals plus 8 replicate Affected individuals (14 people total, with repeated measurements over time). Call these variables of interest Time and Treatment.
I also have a covariate of interest recorded for each person and timepoint, measuring a phenotype as a continuous variable. Call the variable P.
I have been considering using the regression spline as in example 9.6.2 many time points, thinking this is more appropriate.
My questions:
(A) Is it correct to include a covariate with the regression spline? E.g., if I follow the example where X=ns(Time, df=number); design=model.matrix(~Treatment*X*P).
I am not sure if it's valid to make this sort of an ANCOVA-type regression, or how to modify it?
My primary contrasts of interest would be the Treatment:Time interaction (impact of treatment during a series of 4 baseline, 4 experimental, plus 4 recovery times), and the relation of covariate P on gene expression.
It might be nice to explore other interactions, but I am a bit concerned with over-paramterizing the model and being able to interpret it correctly, and with testing too many contrasts. Perhaps I should have the 3-way interaction in the design matrix and only select a few contrasts (Time:Treatment, P) to analyze.
I did see there was a 'global' correction for multiple contrasts.
(B) How do I select the df? The limma example suggests 3-5, but it has 16 Control + 16 Treatment individuals with no replication.
I could select 12 knots for my 12 timepoints (df=14, or 1+1+12knots), and if the times were evenly spaced that would yield 14 datapoints (6 Control + 6 Treatment subjects) in each knot-bounded interval, if my understanding is right.
Yet that would lead to a model with a large number of parameters (for a design=model.matrix(~Treatment*X*PVT) there are 52 columns in the matrix).
I'm not sure if that is valid for a dataset with only 14 subjects?
I could do a heuristic trial and error with different values for df, but I'm not sure how best to then evaluate and compare results of different models, since there is no AIC/BIC type output.
I assume I ultimately will need to add a Subjects factor to the design matrix and/or use the duplicateCorrelation command to account for repeated measurements (the 12 timepoints) on each subject.
Thank you in advance for your help in setting up the model.matrix and spline functions.
Hilary
Thank you very much. This is very helpful. I have a couple more questions, and especially am hoping you wouldn't mind scanning my contrasts (coef) setup below to see if I correctly interpreted what you said. Mainly I'm uncertain of my setup because I typically am able to make one contrasts matrix for all factors I want to test, and in this case I wasn't able to figure out how to do so.
First for specifying the contrast matrix:
#Step i. To test for main effect of the covariate P: I'm not sure how I would do this with makeContrasts (?), so I used coef
>fit = lmFit(y, design, block=subject, correlation=corfit$consensus);
fit = eBayes,fit);
topTable(fit, coef="p")
#Step ii. To test for treatment-specific response to time, i.e. the interaction effect
>fit = lmFit(y, design, block=subject, correlation=corfit$consensus);
cont.mat = makeContrasts(TreatmentA.X1-TreatmentB.X1, TreatmentA.X2-TreatmentB.X2, TreatmentA.X3-TreatmentB.X3, levels=make.names(colnames(design)));
fit2 =contrasts.fit(fit, cont.mat);
fit2=eBayes(fit2);
topTable(fit2)
#Step iii. To test for effect of treatment as main effect, regardless of time: as you said a multi-column contrast equating matching interaction terms and treatment
--Same as Step ii. above but with cont.mat = makeContrasts(TreatmentA-TreatmentB, TreatmentA.X1-TreatmentB.X1, TreatmentA.X2-TreatmentB.X2, TreatmentA.X3-TreatmentB.X3, levels=make.names(colnames(design)))
Second, when you mention staying with the recommended 3-5 knots "as long as it gives you a flexible fit that handles the time-dependent expression trends for most genes." How can I tell if the model/knot choice does so? I just found that limma does have an AIC/BIC testing option with selectModel, but I'm not sure if that's the best method or applicable with splines.
Thank you again!
~Hilary
For starters, I can't say if your use of
block
andcorrelation
inlmFit
is correct, because I don't know the precise details of your experiment with regards to the origin of each sample. Ifsubject
refers to the patient from which each sample was derived, and you have 14 different individuals, then it doesn't seem to make any sense to block on them.Anyway, the contrasts you set up are mostly right. For the first contrast, make sure the
coef
matches the name in the design matrix, e.g., use capital"P"
if that's what it's called. For the second contrast, I assume you've renamed all interaction terms to replace":"
with"."
as the former doesn't work inmakeContrasts
. You also don't need to re-runlmFit
, you can just re-use thefit
object from the first contrast. You're correct in runningtopTable
without specifyingcoef
; this will test all contrasts in the matrix at once, which is necessary when you're contrasting spline coefficients.For the last two contrasts, I wouldn't put any particular weight in the reported log-fold changes. This refers to changes in the spline coefficients and are pretty difficult to interpret. Instead, I'd just plot out the expression trends with respect to time for significant genes, to get a better picture of how they're behaving.
Finally, I usually evaluate model designs informally, by seeing whether a larger model results in a substantial increase (e.g., more than 50%) in the number of DE genes. If so, I use it; otherwise, I keep the simpler model. In your case, though, you don't have a lot of residual d.f. to spare, so I'd stick with
df=3
in the spline.Thank you again; you have been very helpful! Because the 12 timepoints are repeated measurements on each of the 14 subjects (12*14 for 168 arrays), I assume it is necessary to use blocking based on the post in LIMMA on matched time series data. I will be sure to plot the genes and use this to interpret rather than relying on fold change; thank you!
Best,
Hilary
I am working on an experimental design very similar to the one you have posted above Aaron, and I had a few questions regarding your response. Could you explicitly write out how each of your suggestions would look using the model.matrix above? For example, I'm trying to do three things with my dataset which I've listed below:
How does the output contrast below compare/differ from the first line of code above in terms of interpreting the output?
Make a new question.
I made one with my new username tleona3 and the title is: Question: LIMMA time series spline model.matrix design setup