Hi everyone,
I am finalizing the design of an RNASeq experiment and am having doubts about how I should plan to approach differential expression analysis. My design includes three factors of interest: Surgery group (2 levels-sham and surgery), time point (2 levels), and sex (2 levels). I have some experience with DESeq2 but am considering switching to edgeR or limma. The biological questions I am hoping to address are:
- What genes are differentially expressed between surgery and sham groups?
- Of the genes affected by surgery, which are differentially expressed between males and females?
- Of the genes affected by surgery, which are differentially expressed between the two time points?
- What genes are differentially expressed between males and females overall?
I’m looking for help with the following questions:
- Is DESeq2, edgeR, or limma more straightforward for this type of analysis?
- How should I design my differential expression model with these questions in mind? Generally, I am interested in which of the following two options is most correct:
- Make a model including interaction terms for SurgeryGroup:TimePoint and SurgeryGroup:Sex (such as Example 2 of the DESeq2 help for the Results function, but with an additional interaction term).
- Combine all of the condition combinations into a single vector and individually do all of the relevant contrasts, as described here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions. If I do this, is it legitimate to first do a contrast to look at main effect of sex, then to look at which genes respond differently to surgery in male vs. female mice? My confusion may be stemming from the examples given in the DESeq2 documentation.
- Has anyone come across good examples of a 3-factor design with interactions?
- In discussing my design with others, the concern was raised that I have 8 groups total and that by making comparisons between most of them with interactions, I will be reducing my statistical power to detect differentially expressed genes. To what extent is this a legitimate concern?
I realize these are basic questions, so thank you for bearing with me. I appreciate any advice and suggestions you can provide!
Thanks!
Kendra
Thank you for your helpful and quick response!
The limma user guide is really clarifying my understanding of the two approaches -- it wasn't clear to me from the DESeq2 guide that the two approaches are equivalent, but I agree that the one factor approach is much easier to interpret!
I expect there to be biologically relevant differences in effect of surgery between sexes and time points, so if I understand you correctly, it would not be meaningful to look at the overall effect of surgery if that ends up being the case.You say that in order to look at the overall effect of treatment I must establish that the effect is "consistent" over both sexes or both times, but what is the criteria for determining consistency? Is there a non-subjective way to determine this?
Regarding my concern about statistical power, does having a large number of treatment combinations in the analysis increase the need for more biological replicates per group in order to reliably detect differences? I'm aware that the only real way to answer this for my case would be to do a power analysis, but if you have any general perspectives I'd appreciate hearing them.
To test consistency, you just test whether the two log fold changes are the same. You test whether the contrast:
is equal to zero. The first parentheses is the log-fold-change for Surgery vs Sham for males at time1. The second parentheses is the log-fold-change for Surgery vs Sham for females at time1. The contrast tests whether the two logFCs are equal, in other words, whether the Surgery vs Sham effect is equal (i.e., consistent) for males and females at time1.
The contrast is of course the interaction between Surgery and Sex at time1.
There's no practical way to do a power analysis for an RNA-seq experiment, in my opinion.
If you are dealing with human subjects, then you need as many replicates as you can possible get. Human are variable. Subject availability is almost always the main limitation.
If you are dealing with genetically identical model organisms, then it may be possible to work successfully with far fewer replicates.