edgeR: time series analysis
1
0
Entering edit mode
@bharathananth-10049
Last seen 6.5 years ago

Hi

I have RNA-seq time course data consisting of 11 individual time points. I however do not have replicates for each time point. I am trying to fit a simple linear model of the form to detect oscillations:

time <- seq(2,22,by=2)

in.phase <- cos(2*pi/22*time)
out.phase <- sin(2*pi/22*time)

design <- model.matrix(~in.phase + out.phase)

My question is can my large residual degrees of freedom compensate for my lack of biological replicates at each time point. In other words, can I use the standard pipeline with estimateDisp(y, design, robust=TRUE) to process my data or do I need to (a) choose a reasonable BCV value (as suggested in the manual) (b) only estimate trended dispersion?

Following the standard pipeline, I was wondering if the oscillating genes (are obviously also the ones with lot of sample to sample variability in my case) get assigned larger than "reasonable" tag wise dispersion? I do not have problems with identifying them with the standard pipeline, but I am trying to understand what assumptions I am making.

Thank you.

edger time course • 2.1k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 day ago
The city by the bay

As long as your model has non-zero residual d.f., you can estimate a dispersion for each gene. With time series, the general assumption is that expression follows some smooth trend with respect to time - deviations from that trend can be used for dispersion estimation. Obviously, the more residual d.f. you have, the more precise your dispersion estimates are, and the more reliable your downstream analyses will be. This is easiest to achieve with more replicates, as it avoids the need to make strong assumptions about your response to time.

In your case, you've applied the cosine and sine functions under the assumption that one cycle takes exactly 22 time units. I can't remember all my trigonometric identities, but I don't think that linear sums of these functions can be used to represent situations where cycles are faster or slower. If a gene had a different cycling time, its expression profile with respect to time would not be modelled well, resulting in an inflated dispersion.

ADD COMMENT

Login before adding your answer.

Traffic: 398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6