Effect of treatment vs control over time in RNA Seq (edgeR and limma)
1
0
Entering edit mode
joelrome88 ▴ 20
@joelrome88-9626
Last seen 7.8 years ago

Hi everyone! This is a bit long but I hope to state my problem more clearly

I've been trying to perform an analysis on the effects of a treatment over different time points using RNASeq with both limma and edgeR but I'm a bit confused with how to approach and interpret the results.

The experiment consists of samples taken at 5 different time points, with both treatment and control (this gives a total of 10 groups). Additionally, I have a baseline group (control at time 0). In total there are 11 groups. For each sample we have 4 replicates, so in total it's 44 libraries. Only one lane was sequenced (all libraries were multiplexed and sequenced in one single lane).

We wish to know what are the effects of the treatment at each time point (i.e., which genes are differentially expressed at time 1 in treatment vs control, which at time 2, and so on.)

First I tried using edgeR using a model that takes into account the time, treatment, interaction and an experimental effect of the tube strips where the libraries where made:

> design <- model.matrix(~Time+Treatment+Strip+Time:Treatment) 

I coded the times tm1-5. The control is a and the treatment is coded as b.

Here comes one of my first confusions. The design matrix only takes displays the interaction between the time and the treatment but show's no time:control. Also, I had to manually get rid of a term in the matrix (b) to make it full rank:

         (Intercept) tm1 tm2 tm3 tm4 tm5 b Strips2 Strips3 ... tm1b tm2b tm3b tm4b tm5b

For what I understand the last part of the design matrix accounts in part for what I'm looking for but I don't know how to interpret in relation to a treatment effect relative to a control in a particular time point.  

I tried two different approaches:

1) using an ANOVA for the coefficientes corresponding to the terms I wanted resulted only in 15 significant DE genes (FDR < 0.05)

> glmLRT(Fitglm, coef=12:16)

2) using matrix of contrasts I found ~700 DE genes.

> con <- makeContrasts('tm1'=tm1b - tm1, tm2'=tm2b - tm2, tm3'=tm3b - tm3, 'tm4'=tm4b - tm4,'tm5'=tm5b - tm5                    levels=design)
> glmLRT(Fitglm, contrast = con)

 

I next tried using limma converting the normalized read counts with voom and following section 9.6 of the limma vignette.

Using the same model as in edgeR gave no DE genes using coefficients and only 10 genes using contrasts: 

> design <- model.matrix(~0 + Time + Treatment + Strip + Time:Treatment, data = targets)
> fit <- lmFit(y, design)
> fit <- eBayes(fit)
#Contrasts:
> topTable(fit, coef=12:16,n=Inf)
#Coefficients:
> topTable(fit, coef=12:16,n=Inf)
>cont.dif <- makeContrasts(  Dif1 =Time1.Treatmentb-Time1, Dif2 =Time2.Treatmentb-Time2, (...), levels=design)
>fit2 <- lmFit(y, design)
>fit2 <- contrasts.fit(fit2, cont.dif)
>fit2 <- eBayes(fit2)

 

3) Following the Many timepoints section of the limma guide (9.6.2) and an advice from a post in http://permalink.gmane.org/gmane.science.biology.informatics.conductor/53752 also gave no DE genes using the contrast as adviced in the post nor using the advice from the limma guide.

X <- ns(as.numeric(targets$Time), df=4)

Treat <- factor(targets$Treatment)

>design3 <- model.matrix(~Treat+Treat:X)
>colnames(design3) <- make.names(colnames(design3))
>fit3 <- lmFit(y,design3)
>fit3 <- eBayes(fit3)

## different responses between the  different time courses.

cont.mat <- makeContrasts(
  Treatb.X1-Treata.X1,
  Treatb.X2-Treata.X2,
  Treatb.X3-Treata.X3,
  Treatb.X4-Treata.X4,
  levels=design3)
fit4 <- contrasts.fit(fit3,cont.mat)
fit4 <- eBayes(fit4)
topTable(fit4)

Has anyone had any situations like this? Are there significative differences between using coefficients in limma and edgeR? What would be the best approach for this case (testing the effects of treatment vs control at different times). Is the baseline group (control@ time0) of any use.

Any insights and suggestions are welcome!

 

### Edited to add sessionInfo() output:

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-apple-darwin15.3.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.12.1 limma_3.26.9

loaded via a namespace (and not attached):
[1] tools_3.2.4
edgeR limma voom RNASeq multiple time points • 2.0k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 3 months ago
Icahn School of Medicine at Mount Sinai…

If you don't understand the interpretation of coefficients in an interaction model, I strongly recommend you avoid such a design matrix in favor of the equivalent group means design (ensure that Time and Treatment are both factors):

targets$Group <- interaction(targets$Time, targets$Treatment, drop=TRUE)
design <- model.matrix(~0 + Group + Strip, data=targets)

Now the first 11 coefficients will represent the means of the 11 combinations of Time and Treatment (after subtracting the batch effects of Strip). Knowing this, you can now phrase any contrasts you want as differences between group means.

ADD COMMENT
0
Entering edit mode

Hi Ryan, 

If I understand correctly this is the same as doing:  > model.matrix(~0 + Time:Treatment + Strip, data=targets). Does it affect taking away the effects of time and treatment? I tried the model I jus wrote using an anova of contrasts:

 

> con <- makeContrasts('tm1'=Grouptm1.b - Grouptm1.a,
                     'tm2'=Grouptm2.b - Grouptm2.a,
                     'tm3'=Grouptm3.b - Grouptm3.a,
                     'tm4'=Grouptm4.b - Grouptm4.a,
                     'tm5'=Grouptm5.b - Grouptm5.a,
                     levels=design)
> lrt.anova.contrasts <- glmLRT(Fitglm, contrast = con)

Finding only 5 DE genes. If I use single contrasts at a time, all FDRs are 0.99. What do you think can be happening?

 

ADD REPLY
1
Entering edit mode

Your formulation is not quite the same as Ryan's in practical terms, as you can potentially end up with all-zero coefficients in your design matrix if not all of the time/treatment combinations have samples in your data set. Ryan's approach is neater as such levels are automatically dropped from the Group factor. It is also a lot simpler to figure out what you're actually comparing - with interaction models, it's very easy to confuse yourself when constructing your contrasts.

Now, if all your FDRs are large, then that just means that there's no evidence for DE in your data set. Whether this is due to a lack of power or an absence of any treatment effect is harder to say. Probably have a look at the BCV values (should be below 0.2-0.3 for well-controlled experiments) and also have a look at the log-fold changes of the top DE genes (as a very rough rule of thumb, you should be seeing log-FC's of at least 2 in the top set of genes if there's a decent effect).

P.S. Update to the latest version of R and Bioconductor. No reason to hang around with old stuff.

ADD REPLY
0
Entering edit mode

It may be that the treatments are not very different, or the effect of time is so much larger that it overshadows the treatment effect. You should look at the MDS plot of your data. If groups A and B don't separate at all on the MDS plot, you shouldn't expect to find DE genes between them.

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6