Question

DESeq2 1.16.1 : Zero count values re-occurrence of issue 56191?

0

Entering edit mode

greg.deakin • 0

@gregdeakin-14758

Last seen 6.0 years ago

Hello - I have an issue with genes (or OTUs in my case) which have zero expression across a condition.

Seems to be the same issue as

A: DESeq2; FoldChange values with only 0's

Works as expected (well as I'd expect) in DESeq2 1.10.1

Below are a few of the details

model = ~ Block + Treatment

contrast = c("Treatment","D","Control")

Treatment is a factor with 5 levels (Block has multiple levels, but I suspect that's irrelevant anyway)

Example of dodgy "gene"

aggregate(t(counts(dds["XXX"],normalize=T)),list(dds$Treatment),sum)

Group.1 XXX
1 Control 0.0000
2 A 101.9302
3 B 130.5273
4 C 240.4031
5 D 0.0000

DESeq 1.16.1 (R 3.4)

res["XXX",]

log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment D vs Control
DataFrame with 1 row and 6 columns

baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
XXX 3.152404 -20.7867 4.592323 -4.526401 5.999657e-06 0.0003431804

I get the same results if I set betaPrior to True

-------------------------------------------------------------------------

DESeq2 1.10.1 (R 3.2)

res["XXX",]

log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment ND vs Control
DataFrame with 1 row and 6 columns

baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
XXX 3.152404 0 1.3954 0 1 1

deseq2 • 1.3k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 43k • written 7.2 years ago by greg.deakin • 0

score 3 · Answer 1 · 2018-01-11

hi Greg,

I have this change documented in the NEWS file:

CHANGES IN VERSION 1.15.28
--------------------------

    o Remove some code that would "zero out" LFCs
      when both groups involved in a contrast had zero counts.
      This lead to inconsistency when similarly contrasts
      were performed by refactoring.

It presented more of a problem than it was worth, so I took out this code. The problems that we do see, occur when the data isn't well approximated by a NB, often it's not RNA-seq, and the Wald is more sensitive to this non-NB distribution than the LRT. I would recommend to use likelihood ratio tests, where you can provide a full matrix which is:

full <- model.matrix(design(dds), colData(dds))

And reduced can be constructed by removing a column (e.g. the D vs control difference).

You can do:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)

Then:

dds <- nbinomLRT(dds, full=full, reduced=reduced)
res <- results(dds)

I've seen this to provide robustness to e.g. large count outliers that are inconsistent with the NB model, and give expected results.