Hello,
Thank you for the creation of the DESeq2 tool! It has been very useful in understanding the requirements and downstream analysis of an rnaseq experiment.
At the moment I have a phd student who has a time-course experiment where they have low mapping rates for significant portion of their genes. I would like to check a few things, largely the design formula and potential issues with batch effect and time-series. Sadly our stats team is unfamiliar with these concepts and the tool as of yet and has helped as much as they can.
The project owner has two groups, in-vitro & in-vivo, the latter undergo three conditions, condition_1, condition_2, condition_3. I believe there are only two replicates per in-vitro sample.
To identify and compare the top DEGs within the different areas I have made the following design models using the LRT method to test:
- different conditions (regardless of group):
~ condition # (reduced model: ~ 1)
the differences between the two groups:
~ group # (reduced model: ~ 1)
the effect each condition has on the groups:
~ group + condition + group:condition
# (reduced model: ~ group + condition)
- the effect the condition has over time, in each group:
~ time_point + group + condition + group:condition # (reduced model: time_point + group + condition)
I have been informed that one of the groups is missing 2 intermediate time-points and I wanted to ask how this will effect the analysis.
I have also considered using RUVseq to remove any potential batch effect from the data but also looked into using this design:
~ time_point + batchRun + group + condition + group:condition
# (reduced model: ~ time_point + batchRun + group + condition)
From the tutorials and workshop of matrix designs, I unfortunately haven't entirely understood how the contrast or name arguments are used to define the model, including during the use of the results() function.
Many thanks,
Unable to edit post but tutorials used include: