Hello!
I'm trying to figure out which design model is correct for my human RNA seq data. I have outcome data, here 'condition' that when examined alone, returns no differentially expressed genes (as design=~condition). So digging around led me to believe I need to control for the sex of the samples. I tried both of the below, and am unsure which is accurate...
dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)
Or
dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~sex+condition)
The first gives me:
> summary(res)
out of 20338 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 143, 0.7% LFC < 0 (down) : 94, 0.46% outliers [1] : 0, 0% low counts [2] : 0, 0% (mean count < 0) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results
> sum(res$padj < 0.1, na.rm=TRUE) [1] 237 > sum(res$padj<0.05,na.rm = TRUE) [1] 169
The second gives me:
> summary(res)
out of 20338 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 0, 0% LFC < 0 (down) : 0, 0% outliers [1] : 0, 0% low counts [2] : 0, 0% (mean count < 0) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results
> sum(res$padj < 0.1, na.rm=TRUE) [1] 0
I've also run DESeq2 with just the sex and gotten:
> summary(res) out of 20338 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 173, 0.85% LFC < 0 (down) : 135, 0.66% outliers [1] : 0, 0% low counts [2] : 7492, 37% (mean count < 19) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results > sum(res$padj < 0.1, na.rm=TRUE) [1] 308
So, I don't think the first is just the effect of the sex on the samples. Again, not sure which of the two models I should be using to control for sex, or if that's the right call for looking at outcome data at all....
Without showing us exactly how you got from dds to res, it's not possible to give much advice. But I suspect that you're not specifying a contrast, and therefore you're letting DESeq chose the default contrast, which is the final term in your design. So the ~ sex + condition model is the one that defaults to the thing you probably want. But again, we have no details on what levels there are of 'condition', so it might not be corresponding to what you/we think it does. Best to specify exactly which contrast you want in the call to DESeq (so that it's much less trouble to interpret when you/others come to review your analysis) rather than rely on defaults.
But as I imply, we really need more info on the setup. When you say 'Outcome' what types of outcome are you looking at. If this is e.g. survival data, then we're in very different territory.
Sorry! It's a dichotomous outcome. So there's only yes or no. It's a calculated binary outcome based on reported pain 6 weeks out from an event. i.e, over a 7, gets a yes, under gets a no.
Here's all my code:
I guess what I don't understand about the contrast is that is seems to be only specified in
results(dds, contrast=c("condition","C","B")), after DESeq2 has run.
My goal is really just to see if they are any differentially expressed genes between the two dichotomous outcome groups. I was getting none, which led me to try controlling for the sex of the sample as well.