Question

Is my design appropriate for contrast in DESeq2?

0

Entering edit mode

dbouzo • 0

@dbouzo-13452

Last seen 6.7 years ago

Hi all,

I am a microbiology grad student new to bioinformatics, conducting some RNA-Seq experiments to determine differentially expressed genes after antimicrobial treatment.

I have an untreated control and 4 different antimicrobial treatments all conducted on the same cell type/microorganism with 3 biological replicates for each. The only factor changing between these groups is the treatment applied to them.

Initially I determined differentially expressed genes for each treatment compared to the untreated control separately each with their own DESeqDataSet objects. This made it difficult to compare DE between groups and visualise these as heat maps etc. After some reading I generated one DESeqDataSet object which included all treatments, and then apply the contrasts argument to determine DEG for each treatment compared to the untreated control.

First I set the reference level:

dds$condition <- relevel(dds$condition, ref="untreated")

To determine differential expression:

dds <- DESeq(dds)
res_treatment1 <- results(dds, alpha=0.05, lfcThreshold = 1, altHypothesis="greaterAbs", contrast = c("condition", "treatment1", "untreated"))

The number of differentially expressed genes, outliers and low count genes were quite different between these two approaches despite using the same BAM files of alignments and same FDR and LFC thresholds.

Despite reading the DESeq2 manual I was still unsure which approach was more appropriate - any advice is most welcome. Thank you!

deseq2 design and contrast matrix rnaseq • 1.1k views

ADD COMMENT • link updated 7.4 years ago by Gavin Kelly ▴ 690 • written 7.4 years ago by dbouzo • 0

score 2 · Accepted Answer · 2017-10-25

The difference will be that, in the single-dataset approach, you're estimating the variance (biological variability) by pooling the estimates within each treatment group. This will give you greater power, so is generally the recommended approach. Splitting the data into pairs of treatments will have less than half the number of degrees of freedom, so won't be as powerful, but will protect you from the unlikely issue that variance varies strongly between conditions (and you want to capture that fact in your analysis). Situations that merit this would be where there's a treatment group that the scientist has realised is un-interesting, but happens to have an outlier sample within it: even though you'd never be using that group of samples 'directly', it would still influence pairwise tests that didn't appear to involve it, by contributing an increased overall variability.

The vast majority of experiments I analyse are best done with all treatment groups included together (and comparisons pulled out with contrasts). Yours looks as if it would fit into that pattern.