Dear BioC list,
I am currently working on a microarray timecourse experiment described bellow :
- Six time points,
- Two cell types A and B,
- No control experiment.
Basically, I am studying the effect through time of a treatment on gene expressions in different cell lines. The biological questions I have are :
- Which are the genes that are variable through time in one cell type but not the other ?
- Which are the genes that are variable through time and whose expression is concordant among cell types ?
Following the advices given in the Limma vignette (p.47 and 48), as well as explanations given in several BioC threads (question about design for limma time course, 2 conditions and drug treatment microarray experiment, design and contrast matrix for limma time series without replicates, https://support.bioconductor.org/p/58249/) I am trying to use Limma to analyze this data set, and to include splines when building my design matrix.
Here is what I've done so far :
> Design <- data.frame(Cell = c(rep("A", 6), rep("B", 6)), Time = rep(c(0,24,48,96,144,192), 2)) > print(Design) Cell Time 1 A 0 2 A 24 3 A 48 4 A 96 5 A 144 6 A 192 7 B 0 8 B 24 9 B 48 10 B 96 11 B 144 12 B 192 > Splines <- ns(Design$Time, df = 3) > Design_LIMMA_Cell <- model.matrix(~ Splines * Design$Cell) > print(Design_LIMMA_Cell) (Intercept) Splines1 Splines2 Splines3 Design$CellB Splines1:Design$CellB Splines2:Design$CellB Splines3:Design$CellB 1 1 0.00000000 0.0000000 0.0000000 0 0.00000000 0.0000000 0.0000000 2 1 -0.10236787 0.3434740 -0.2250347 0 0.00000000 0.0000000 0.0000000 3 1 -0.05914652 0.5404054 -0.3538571 0 0.00000000 0.0000000 0.0000000 4 1 0.40630077 0.4406083 -0.2195072 0 0.00000000 0.0000000 0.0000000 5 1 0.41932830 0.3328953 0.2004080 0 0.00000000 0.0000000 0.0000000 6 1 -0.14592934 0.4231951 0.7227343 0 0.00000000 0.0000000 0.0000000 7 1 0.00000000 0.0000000 0.0000000 1 0.00000000 0.0000000 0.0000000 8 1 -0.10236787 0.3434740 -0.2250347 1 -0.10236787 0.3434740 -0.2250347 9 1 -0.05914652 0.5404054 -0.3538571 1 -0.05914652 0.5404054 -0.3538571 10 1 0.40630077 0.4406083 -0.2195072 1 0.40630077 0.4406083 -0.2195072 11 1 0.41932830 0.3328953 0.2004080 1 0.41932830 0.3328953 0.2004080 12 1 -0.14592934 0.4231951 0.7227343 1 -0.14592934 0.4231951 0.7227343 attr(,"assign") [1] 0 1 1 1 2 3 3 3 attr(,"contrasts") attr(,"contrasts")$`Design$Cell` [1] "contr.treatment" > Fit_LIMMA_Cell <- lmFit(Data_RMA_Clean, Design_LIMMA_Cell) > Fit_LIMMA_Cell <- eBayes(Fit_LIMMA_Cell)
To detect genes with different trends between the two cell lines, I just have to conduct the F-test on the last 3 parameters corresponding to differences in the curves between the two cell lines. Am I right ?
> Sig_LIMMA_Time_Cell_Spe <- topTable(Fit2_LIMMA_Cell, coef = c(6:8) , adjust="BH", number = Inf)
However, I am not sure how to proceed to answer my second biological question, i.e. highlighting variable genes whose change in expression is concordant between cell types.
Is my design matrix suited for this specific question ? Is using topTable(Fit2_LIMMA_Cell, coef = c(2:4) , adjust="BH", number = Inf)
the way to go ?
Usually, people are interested in detecting changes through time comparing treated samples vs control samples. But for this specific experiments, I am also interested in identifying genes with the same behavior among cell types, with the idea to identify the "core" transcriptional network involved in the cellular answer to this specific treatment.
Any help is much appreciated !
Thanks a lot,
Cheers,
Pef
You need to make
batch
a factor. When you do so, the batch becomes fully nested within the cell type, so there's no point including the latter in the design matrix. Instead, you can do:I use an intercept-free model because it's easier to explain. In particular, if you look at the coefficients, you get:
The first 6 coefficients are the batch-specific intercepts of the spline, and can be ignored. This is because you're only interested in the response to time within each cell type, which are represented by the remaining cell type-specific spline coefficients. You can then deal with these like you did before with the previous design, i.e., containing just A and B.
Edit: A bit of clarification; to do the comparisons between A and B here, you'll have to do a bit more work:
This is because the parametrization of the spline coefficients is different in an intercept-free model. Here, each set of spline coefficients represents that of the spline for each cell type, not the difference in splines between cell types.
Thanks a lot Aaron. This is far more clearer.
Using this new model matrix, if I want to highlight genes exhibiting any time effect, I should use :
any.time <- topTable(Fit2_LIMMA_Cell, coef = c(7:15), n=Inf, sort.by="none").
Am I right ?Thx !
Yep, that's right. That said, going back to the original question, if you want to identify genes with a time effect that is the same in A and B, then you don't necessarily care what happens in C; in such a case, you should only test for "any time effect" in A and B by themselves. This would be equivalent to
coef=7:12
.