Question

Design model for DESeq2 analysis

0

Entering edit mode

hs.lansdell ▴ 20

@hslansdell-14246

Last seen 7.3 years ago

Hello!

I'm trying to figure out which design model is correct for my human RNA seq data. I have outcome data, here 'condition' that when examined alone, returns no differentially expressed genes (as design=~condition). So digging around led me to believe I need to control for the sex of the samples. I tried both of the below, and am unsure which is accurate...

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

Or

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~sex+condition)

The first gives me:

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 143, 0.7% 
LFC < 0 (down)   : 94, 0.46% 
outliers [1]     : 0, 0% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 237
> sum(res$padj<0.05,na.rm = TRUE)
[1] 169

The second gives me:

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 0, 0% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 0

I've also run DESeq2 with just the sex and gotten:

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 173, 0.85% 
LFC < 0 (down)   : 135, 0.66% 
outliers [1]     : 0, 0% 
low counts [2]   : 7492, 37% 
(mean count < 19)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 308

So, I don't think the first is just the effect of the sex on the samples. Again, not sure which of the two models I should be using to control for sex, or if that's the right call for looking at outcome data at all....

deseq2 differential gene expression • 28k views

ADD COMMENT • link updated 7.3 years ago by Gavin Kelly ▴ 690 • written 7.3 years ago by hs.lansdell ▴ 20

1

Entering edit mode

Without showing us exactly how you got from dds to res, it's not possible to give much advice. But I suspect that you're not specifying a contrast, and therefore you're letting DESeq chose the default contrast, which is the final term in your design. So the ~ sex + condition model is the one that defaults to the thing you probably want. But again, we have no details on what levels there are of 'condition', so it might not be corresponding to what you/we think it does. Best to specify exactly which contrast you want in the call to DESeq (so that it's much less trouble to interpret when you/others come to review your analysis) rather than rely on defaults.

But as I imply, we really need more info on the setup. When you say 'Outcome' what types of outcome are you looking at. If this is e.g. survival data, then we're in very different territory.

ADD REPLY • link 7.3 years ago Gavin Kelly ▴ 690

0

Entering edit mode

Sorry! It's a dichotomous outcome. So there's only yes or no. It's a calculated binary outcome based on reported pain 6 weeks out from an event. i.e, over a 7, gets a yes, under gets a no.

Here's all my code:

library(DESeq2)

data<-read.csv("TotalRNA.csv", header=TRUE, row.names = 1, stringsAsFactors = FALSE)

colnames(data) <- substring(colnames(data), 2)


colData<-read.csv("col_data.csv",header = TRUE, row.names = 1)

#Double check names match up between Sample and data matrix

all(rownames(colData)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

dds$condition <- factor(dds$condition, levels = c("no","yes"))

dds<-DESeq(dds)

res<-results(dds)

plotMA(res,ylim=c(-2,2))

summary(res)

sum(res$padj < 0.1, na.rm=TRUE)

I guess what I don't understand about the contrast is that is seems to be only specified in

results(dds, contrast=c("condition","C","B")), after DESeq2 has run.

My goal is really just to see if they are any differentially expressed genes between the two dichotomous outcome groups. I was getting none, which led me to try controlling for the sex of the sample as well.

ADD REPLY • link 7.3 years ago hs.lansdell ▴ 20

score 2 · Answer 1 · 2017-11-07

Thanks for the extra info.

So, there's (at least) two stages to "DESeq2 has run" - the DESeq command gives you the potential to ask many questions via the results command (by selecting the relevant 'contrast' or 'name' argument), of which your code is defaulting to the comparison between sexes. You can be more specific (and in this case, correct) by saying

results(dds, contrast=c("condition", "yes","no"))

It's a free choice as to whether you include sex as a factor: if you do, its effect will be 'normalised out' when estimating the condition effect; if you don't, it's effect (if there be any) will be included in an increased between-sample variability. You may be guided by intuition, prior knowledge, or evidence from PCA and clustering, in this choice, but the contrast you use will be the same, regardless of your choice (unless you want to estimate separate per-sex condition effects).

score 1 · Answer 2 · 2017-11-07

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 hours ago

United States

hi,

As Gavin says above, the crux of the confusion is that results(dds), if you don't provide with any 'name' or 'contrast' uses the last variable in the design. However, the designs above are equivalent, you just need to say which results table you want, using 'name'. See this paragraph here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis