DESeq2 experimental design
1
0
Entering edit mode
Mike ▴ 10
@mike-12142
Last seen 3.2 years ago
Canada

I have 12 RNA-seq samples: 3 replicates each of male control, male mutant, female control, and female mutant. I want a list of genes that are significantly differentially expressed in male (mutant vs control) and female (mutant vs control). I'm not interested in comparing for example male control vs female control though I may do something like a Venn diagram of genes that are differentially expressed in both males and females. Should I do two independent analyses for male and female, or combine everything together (ie. all samples in the same summarizedExperiment and DeseqDataSet) and then use contrasts to specify the two comparisons (ie. contrast=c("Group","male.knockout","male.control")) and contrast=c("Group","female.knockout","female.control")))?

deseq2 multiple factor design • 2.3k views
ADD COMMENT
1
Entering edit mode
Gavin Kelly ▴ 690
@gavin-kelly-6944
Last seen 4.6 years ago
United Kingdom / London / Francis Crick…

Generally you get better statistical power if you have all the samples in the same dataset, as you're estimating the variance across many more degrees of freedom.  The caveat is that if one half of your experiment has, for biological or technical reasons, a different degree of variability, or a greater propensity for samples to be outliers, then the combined approach will over- and under- represent the variability depending on which half of the experiment you're looking at.  But my intuition would be that this doesn't look like one of those situations.  You can get some feel by looking at PCA plots or clusterings - if in one branch of the experiment the clusters are much tighter than the other branch, then you might want to try both approaches and see if positive control genes are better in one case than the other.

Another reason for doing the combined approach is that it will let you do an 2x2 design with interactions, to look at different response to KO between the sexes without having to resort to a venn-diagram-like approach (which often suffers due to two rounds of statistical error).

 

ADD COMMENT
0
Entering edit mode

Thank you for your reply, I have some additional questions about the design formula. These are my 12 samples and let's assume I'm analyzing them all together:

Sex

Genotype BioRep Group
female control 1 female.control
female control 2 female.control
female control 3 female.control
female knockout 1 female.knockout
female knockout 2 female.knockout
female knockout 3 female.knockout
male control 1 male.control
male control 2 male.control
male control 3 male.control
male knockout 1 male.knockout
male knockout 2 male.knockout
male knockout 3 male.knockout

Also the samples are paired, in that male control 1 is paired with male knockout 1, female control 2 is paired with female knockout 2, etc. I want to answer three questions:

1. What genes are differentially expressed in the males (control vs knockout)?

2. What genes are differentially expressed in the females (control vs knockout)?

3. What are the different responses to the knockout in male vs female?

What design formula should I use? I think ~ BioRep + Group

And then to answer questions 1 and 2 above would I use these contrasts?

contrast=c("Group","male.knockout","male.control")

contrast=c("Group","female.knockout","female.control")

I'm not sure how I should modify the design and contrasts to answer question 3, any help is appreciated, thank you.

ADD REPLY
0
Entering edit mode

The contrast you'd need would be something along the lines contrast=list(c("m.ctrl", "f.ko"), c("f.ko", "m.ctrl")) as this would divide the female ko_vs_ctrl by the male ko_vs_ctrl (you'll need to change the entries to correspond to resultsNames values...)

One question that immediately springs to mind, though, is how you've got male and female versions of the same biological replicate.  This may be correct, but seems unlikely - by putting BioRep in as an effect, you're suggesting that there's something connected about samples with the same label (also, double-check that you've got BioRep as a factor, rather than a numeric).

 

ADD REPLY

Login before adding your answer.

Traffic: 962 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6