DESeq2 1.16.1 : Zero count values re-occurrence of issue 56191?
1
0
Entering edit mode
@gregdeakin-14758
Last seen 5.7 years ago

Hello - I have an issue with genes (or OTUs in my case) which have zero expression across a condition.

Seems to be the same issue as

A: DESeq2; FoldChange values with only 0's

Works as expected (well as I'd expect) in DESeq2 1.10.1

Below are a few of the details

model = ~ Block + Treatment

contrast = c("Treatment","D","Control")

Treatment is a factor with 5 levels (Block has multiple levels, but I suspect that's irrelevant anyway)

Example of dodgy "gene"

aggregate(t(counts(dds["XXX"],normalize=T)),list(dds$Treatment),sum)

       Group.1   XXX
1      Control   0.0000
2       A           101.9302
3       B           130.5273
4       C           240.4031
5       D            0.0000

 

DESeq 1.16.1 (R 3.4) 

res["XXX",]

log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment D vs Control
DataFrame with 1 row and 6 columns

 

     baseMean log2FoldChange     lfcSE      stat       pvalue         padj
       <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
XXX 3.152404       -20.7867  4.592323 -4.526401 5.999657e-06 0.0003431804

 

I get the same results if I set betaPrior to True

-------------------------------------------------------------------------

DESeq2 1.10.1 (R 3.2)

res["XXX",]

log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment ND vs Control
DataFrame with 1 row and 6 columns

        baseMean log2FoldChange     lfcSE      stat    pvalue      padj
       <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
XXX  3.152404              0    1.3954         0         1         1

 

 

deseq2 • 1.2k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Greg,

I have this change documented in the NEWS file:

CHANGES IN VERSION 1.15.28
--------------------------

    o Remove some code that would "zero out" LFCs
      when both groups involved in a contrast had zero counts.
      This lead to inconsistency when similarly contrasts
      were performed by refactoring.

It presented more of a problem than it was worth, so I took out this code. The problems that we do see, occur when the data isn't well approximated by a NB, often it's not RNA-seq, and the Wald is more sensitive to this non-NB distribution than the LRT. I would recommend to use likelihood ratio tests, where you can provide a full matrix which is:

full <- model.matrix(design(dds), colData(dds))

And reduced can be constructed by removing a column (e.g. the D vs control difference).

You can do:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)

Then:

dds <- nbinomLRT(dds, full=full, reduced=reduced)
res <- results(dds)

I've seen this to provide robustness to e.g. large count outliers that are inconsistent with the NB model, and give expected results.

ADD COMMENT
0
Entering edit mode

Ah, o.k. thanks Mike.

I guess it's pretty much an edge case, especially in RNA-seq, and not too difficult for me to correct for.

I'll have a look at using LRT. As you guess my data (mostly soil microbiom) might violate a few of the expectations of the DESeq model - it's a bit of an open question how to deal with data consisting mostly of zeros. Having said that, we've had pretty good results using the DESeq Wald test previously, but I can't remember having more than a two factor condition before.

Thanks for your help.

 

ADD REPLY

Login before adding your answer.

Traffic: 905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6