Hi everyone! This is a bit long but I hope to state my problem more clearly
I've been trying to perform an analysis on the effects of a treatment over different time points using RNASeq with both limma and edgeR but I'm a bit confused with how to approach and interpret the results.
The experiment consists of samples taken at 5 different time points, with both treatment and control (this gives a total of 10 groups). Additionally, I have a baseline group (control at time 0). In total there are 11 groups. For each sample we have 4 replicates, so in total it's 44 libraries. Only one lane was sequenced (all libraries were multiplexed and sequenced in one single lane).
We wish to know what are the effects of the treatment at each time point (i.e., which genes are differentially expressed at time 1 in treatment vs control, which at time 2, and so on.)
First I tried using edgeR using a model that takes into account the time, treatment, interaction and an experimental effect of the tube strips where the libraries where made:
> design <- model.matrix(~Time+Treatment+Strip+Time:Treatment)
I coded the times tm1-5. The control is a and the treatment is coded as b.
Here comes one of my first confusions. The design matrix only takes displays the interaction between the time and the treatment but show's no time:control. Also, I had to manually get rid of a term in the matrix (b) to make it full rank:
(Intercept) tm1 tm2 tm3 tm4 tm5 b Strips2 Strips3 ... tm1b tm2b tm3b tm4b tm5b
For what I understand the last part of the design matrix accounts in part for what I'm looking for but I don't know how to interpret in relation to a treatment effect relative to a control in a particular time point.
I tried two different approaches:
1) using an ANOVA for the coefficientes corresponding to the terms I wanted resulted only in 15 significant DE genes (FDR < 0.05)
> glmLRT(Fitglm, coef=12:16)
2) using matrix of contrasts I found ~700 DE genes.
> con <- makeContrasts('tm1'=tm1b - tm1, tm2'=tm2b - tm2, tm3'=tm3b - tm3, 'tm4'=tm4b - tm4,'tm5'=tm5b - tm5 levels=design) > glmLRT(Fitglm, contrast = con)
I next tried using limma converting the normalized read counts with voom and following section 9.6 of the limma vignette.
Using the same model as in edgeR gave no DE genes using coefficients and only 10 genes using contrasts:
> design <- model.matrix(~0 + Time + Treatment + Strip + Time:Treatment, data = targets) > fit <- lmFit(y, design) > fit <- eBayes(fit) #Contrasts: > topTable(fit, coef=12:16,n=Inf) #Coefficients: > topTable(fit, coef=12:16,n=Inf) >cont.dif <- makeContrasts( Dif1 =Time1.Treatmentb-Time1, Dif2 =Time2.Treatmentb-Time2, (...), levels=design) >fit2 <- lmFit(y, design) >fit2 <- contrasts.fit(fit2, cont.dif) >fit2 <- eBayes(fit2)
3) Following the Many timepoints section of the limma guide (9.6.2) and an advice from a post in http://permalink.gmane.org/gmane.science.biology.informatics.conductor/53752 also gave no DE genes using the contrast as adviced in the post nor using the advice from the limma guide.
X <- ns(as.numeric(targets$Time), df=4) Treat <- factor(targets$Treatment) >design3 <- model.matrix(~Treat+Treat:X) >colnames(design3) <- make.names(colnames(design3)) >fit3 <- lmFit(y,design3) >fit3 <- eBayes(fit3) ## different responses between the different time courses. cont.mat <- makeContrasts( Treatb.X1-Treata.X1, Treatb.X2-Treata.X2, Treatb.X3-Treata.X3, Treatb.X4-Treata.X4, levels=design3) fit4 <- contrasts.fit(fit3,cont.mat) fit4 <- eBayes(fit4) topTable(fit4)
Has anyone had any situations like this? Are there significative differences between using coefficients in limma and edgeR? What would be the best approach for this case (testing the effects of treatment vs control at different times). Is the baseline group (control@ time0) of any use.
Any insights and suggestions are welcome!
### Edited to add sessionInfo() output:
> sessionInfo() R version 3.2.4 Revised (2016-03-16 r70336) Platform: x86_64-apple-darwin15.3.0 (64-bit) Running under: OS X 10.11.4 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.12.1 limma_3.26.9 loaded via a namespace (and not attached): [1] tools_3.2.4
Hi Ryan,
If I understand correctly this is the same as doing: > model.matrix(~0 + Time:Treatment + Strip, data=targets). Does it affect taking away the effects of time and treatment? I tried the model I jus wrote using an anova of contrasts:
> con <- makeContrasts('tm1'=Grouptm1.b - Grouptm1.a,
'tm2'=Grouptm2.b - Grouptm2.a,
'tm3'=Grouptm3.b - Grouptm3.a,
'tm4'=Grouptm4.b - Grouptm4.a,
'tm5'=Grouptm5.b - Grouptm5.a,
levels=design)
> lrt.anova.contrasts <- glmLRT(Fitglm, contrast = con)
Finding only 5 DE genes. If I use single contrasts at a time, all FDRs are 0.99. What do you think can be happening?
Your formulation is not quite the same as Ryan's in practical terms, as you can potentially end up with all-zero coefficients in your design matrix if not all of the time/treatment combinations have samples in your data set. Ryan's approach is neater as such levels are automatically dropped from the
Group
factor. It is also a lot simpler to figure out what you're actually comparing - with interaction models, it's very easy to confuse yourself when constructing your contrasts.Now, if all your FDRs are large, then that just means that there's no evidence for DE in your data set. Whether this is due to a lack of power or an absence of any treatment effect is harder to say. Probably have a look at the BCV values (should be below 0.2-0.3 for well-controlled experiments) and also have a look at the log-fold changes of the top DE genes (as a very rough rule of thumb, you should be seeing log-FC's of at least 2 in the top set of genes if there's a decent effect).
P.S. Update to the latest version of R and Bioconductor. No reason to hang around with old stuff.
It may be that the treatments are not very different, or the effect of time is so much larger that it overshadows the treatment effect. You should look at the MDS plot of your data. If groups A and B don't separate at all on the MDS plot, you shouldn't expect to find DE genes between them.