Hello,
I'm analyzing a time series RNA-Seq data with repeated measures on six different time points corresponding to pre-treatment, on-treatment and post-treatment phases, using edgeR:
Timepoint1 Pretreat
Timepoint2 Pretreat
Timepoint3 Pretreat
Timepoint4 Ontreat
Timepoint5 Ontreat
Timepoint6 Posttreat
And my hypothesis test is to look for DEGs comparing Ontreat VS Pretreat, and Posttreat VS Ontreat.
There are two options I can think of to do this:
1). Include the timepoint variable into the glm model (include subject as well since it's repeated measure data) and setting the contrast as:
design<-model.matrix(~0+timepoint+subject) mycontrast<-makeContrasts(OnvsPre=(timepoint4+timepoint5)/2-(timepoint1+timepoint2+timepoint3)/3, PostvsOn=timepoint6-(timepoint4+timepoint5)/2, levels=design)
2). Include the treatment phase variable into the model (which essentially combines different timepoints within the same treatment into one group):
design<-model.matrix(~0+treatment+subject) mycontrast<-makeContrasts(OnvsPre=Ontreat-Pretreat,PostvsOn=Posttreat-OnTreat,levels=design)
Since I am really new to RNA-Seq analysis and ignorant in statistics, my questions are:
1). For the first method, am I setting the contrast in the right way?
2). For the second method, is it justified to combine different timepoints into one group, or will it fall into the issue of repeated measures?
Thanks very much for any help here!
Thanks Aaron. Much appreciated.
A follow up question:
To interpret the first model, does it mean that the model essentially takes the averaged expression values of the three timepoints of pre-treatment, and takes the average of the two timepoints of on-treatment, and then obtains the DE genes from comparing the averaged expression scores?
Thanks again.
Yes. If you need more stringency, you can test each pair of on-treatment vs pre-treatment timepoints to verify that genes are consistently DE (in the same direction) between groups.