I have 12 RNA-seq samples: 3 replicates each of male control, male mutant, female control, and female mutant. I want a list of genes that are significantly differentially expressed in male (mutant vs control) and female (mutant vs control). I'm not interested in comparing for example male control vs female control though I may do something like a Venn diagram of genes that are differentially expressed in both males and females. Should I do two independent analyses for male and female, or combine everything together (ie. all samples in the same summarizedExperiment and DeseqDataSet) and then use contrasts to specify the two comparisons (ie. contrast=c("Group","male.knockout","male.control")) and contrast=c("Group","female.knockout","female.control")))?
Thank you for your reply, I have some additional questions about the design formula. These are my 12 samples and let's assume I'm analyzing them all together:
Sex
Also the samples are paired, in that male control 1 is paired with male knockout 1, female control 2 is paired with female knockout 2, etc. I want to answer three questions:
1. What genes are differentially expressed in the males (control vs knockout)?
2. What genes are differentially expressed in the females (control vs knockout)?
3. What are the different responses to the knockout in male vs female?
What design formula should I use? I think ~ BioRep + Group
And then to answer questions 1 and 2 above would I use these contrasts?
contrast=c("Group","male.knockout","male.control")
contrast=c("Group","female.knockout","female.control")
I'm not sure how I should modify the design and contrasts to answer question 3, any help is appreciated, thank you.
The contrast you'd need would be something along the lines contrast=list(c("m.ctrl", "f.ko"), c("f.ko", "m.ctrl")) as this would divide the female ko_vs_ctrl by the male ko_vs_ctrl (you'll need to change the entries to correspond to resultsNames values...)
One question that immediately springs to mind, though, is how you've got male and female versions of the same biological replicate. This may be correct, but seems unlikely - by putting BioRep in as an effect, you're suggesting that there's something connected about samples with the same label (also, double-check that you've got BioRep as a factor, rather than a numeric).