I have some general questions about doing a time series RNA-seq experiment.
The RNA is from a non-model organism. The experiment has no conditions for comparison (so just the "control" condition, essentially). Three biological replicates of around 500 individuals each were set up, and about 20-30 individuals (per replicate) were taken and sacrificed for sequencing every four hours for two days (thirteen time points in total).
So the data looks like this (with expression quantified for each Sample
below):
Sample Time Replicate
t0_r1 00 01
t0_r2 00 02
t0_r3 00 03
. . .
. . .
. . .
t48_r1 48 01
t48_r2 48 02
t48_r3 48 03
The main objective of the study is to identify genes that are expressed in a circadian manner. I have decided that I will use MetaCycle
to identify the "circadian genes". This analysis seems fairly straightforward. However, I looked at the PCA plot for this data, and the samples are not well-separated (and the replicates do not cluster together). But this is to be expected given that these samples are not from very "different" conditions? Should I attempt to impose some degree of separation upon the data by capturing this variation with latent variables? (E.g., using RUVSeq
?)
My other question would be: what other analyses can I perform on this dataset?
For instance, would it make any sense to perform an all-vs.-all differential expression analysis and identify significantly expressed genes shared between all pairwise comparisons?
I am a bit stumped and I would be very grateful for some tips and/or pointers (publications included).