Question

DESEQ2 with 3 biological replicates for control and 1 biological replicate for treatment

0

Entering edit mode

wd ▴ 30

@wd-7410

Last seen 4.7 years ago

Germany

Hi

I want to use the DESEQ package between a control (3 biological replicates) and treatment (1 biological replicate).

IN DESeq I herefore used the following code, and got 266 genes with padj < 0.05:

table <- read.delim("test.txt")
row.names(table) <- table$Feature_ID
count_table <- table[, -1]
conds <- c("ctrl", "ctrl", "ctrl", "treatment")
cds <- newCountDataSet(count_table, conds)
cds <- estimateSizeFactors(cds)
cds <- estimateDispersions(cds, method="blind", sharingMode="fit-only")
results <- nbinomTest(cds, "ctrl", "treatment")

In DESeq2 I used the follwing command, but got > 10000 genes with padj < 0.05:

table <- read.delim("test.txt")
row.names(table) <- table$Feature_ID
count_table <- table[, -1]
colData <- DataFrame(condition=factor(c("ctrl", "ctrl", "ctrl", "treatment")))
dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~ condition))
results <- DESeq(dds, minReplicatesForReplace=Inf)

So probably I need to add extra parameters to the DESEQ2 analysis but for now I can't figure out how?

Thank you for helping

Wannes

DESEQ deseq2 deseq replicates biologicalreplicates • 5.5k views

ADD COMMENT • link 9.0 years ago wd ▴ 30

0

Entering edit mode

What is your basis for assuming that you need to add extra parameters to your DESeq2 pipeline?

ADD REPLY • link 9.0 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2015-12-02

0

Entering edit mode

Bernd Klaus ▴ 610

@bernd-klaus-6281

Last seen 6.1 years ago

Germany

Hi Wannes,

sounds like DESeq2 might underestimate the dispersions in your example. Can you produce a plot of the dispersion fit via

plotDispEsts(dds)

and a histogram of the p-values via

hist(results$pvalue, col = "tan2") ?

The DESeq is known to be a bit "conservative" but the difference is quite extreme ...

The plots should help to diagnose things ...

Best regards,

Bernd

ADD COMMENT • link 9.0 years ago Bernd Klaus ▴ 610

1

Entering edit mode

It's not necessarily underestimation, but that the previous code was certainly overestimating:

"blind - Ignore the sample labels and compute a gene’s empirical dispersion value as if all samples were replicates of a single condition. This can be done even if there are no biological replicates. This method can lead to loss of power; see the vignette for details. The single estimated dispersion condition is called "blind" and used for all samples."

more comparable (and recommended by Simon I think) would have been pooled or pooled-CR for DESeq.

ADD REPLY • link 9.0 years ago Michael Love 43k

1

Entering edit mode

another comment: this sounds like an experiment where control samples are very similar, and the treatment sample (no replicates) is very different. You could use plotPCA to investigate the inter-sample relationships.

The result from null hypothesis testing is not very interesting for experiments like this.

ADD REPLY • link 9.0 years ago Michael Love 43k

0

Entering edit mode

Dear Michael and Bernd

Thank you for your comments.

Please find attached the links to the requested plots for the DESEQ2 analysis.

histogram: http://pasteboard.co/2AastiqQ.png

dispersion fit: http://pasteboard.co/2AazY7KX.png

PCA: http://pasteboard.co/2AaJb1PI.png

I also ran the DESEQ analysis using the "pooled" method and this time also around 10000 (out of 18000) genes (similar to DESEQ2) had a padj < 0.05.

Question is which analysis is the most 'correct' one.

Wannes

ADD REPLY • link 9.0 years ago wd ▴ 30

0

Entering edit mode

The blind method is not appropriate. PC1 / 90% of the variance of the top genes is the difference between control and treatment. There are many, large differences between control and treatment.

ADD REPLY • link 9.0 years ago Michael Love 43k

0

Entering edit mode

Hi Mike,

sorry, I overlooked the "blind" option and agree: DESeq is most probably rather overestimating the dispersions and the blind option is not appropriate.

@Wannes: the DESeq2 results function has an option lfcThreshold, which is set to 0 by

default. It might be worthwile to look at the fold changes and test against a threshold higher than 0. After all, you have only one treatment sample, so you don't really know how variable the treatment samples are. You essentially assume that the variability in the control group is the same as in the treatment group. Thus it might be better to require a minim FC different from zero to be more "conservative".

Looking at the FC of genes that should not be DE according to the biology can be helpful to set such a threshold.

Bernd

ADD REPLY • link 9.0 years ago Bernd Klaus ▴ 610

0

Entering edit mode

Hi Mike,

sorry, I overlooked the "blind" option and agree: DESeq is most probably rather overestimating the dispersions and the blind option is not appropriate.

@Wannes: the DESeq2 results function has an option lfcThreshold, which is set to 0 by

default. It might be worthwile to look at the fold changes and test against a threshold higher than 0. After all, you have only one treatment sample, so you don't really know how variable the treatment samples are. You essentially assume that the variability in the control group is the same as in the treatment group. Thus it might be better to require a minim FC different from zero to be more "conservative".

Looking at the FC of genes that should not be DE according to the biology can be helpful to set such a threshold.

Bernd

ADD REPLY • link 9.0 years ago Bernd Klaus ▴ 610

0

Entering edit mode

Hi

Thank you for the advice! As a test, I ran the analysis with a lfcthreshold of 2 and padj < 0.01, and now I obtained 2000 DEGs out of 18000 genes, which seems more reasonable.

Wannes

ADD REPLY • link 9.0 years ago wd ▴ 30

0

Entering edit mode

I just have to add, "reasonable" is relative to your precise question. If you want to know, how many genes are not likely to have a fold change of 0, then 10,000+ is reasonable. If you want to know, which genes have very large fold change (>4), then you get a smaller number. In other words, you can make the number as small or large as you like.

ADD REPLY • link 9.0 years ago Michael Love 43k