DESeq2: qualitatively different results between deseq2 v1.14 and deseq2 v1.22 when run with shrinkage
1
0
Entering edit mode
vvi • 0
@vvi-19280
Last seen 6.0 years ago

Hi Michael / DESeq2 developers,

I’m trying to make sense of a comparison of DESeq v1.14 and DESeq v1.22 on the identical data set / condition. The results I found between versions with shrinkage “on” are qualitatively different - 14 hit genes from v1.14, 75 hit genes from v1.22, so I have two questions (see Question1 / Question2 below). All the files, code and results / pictures I refer to below are here: https://drive.google.com/drive/folders/1w1JipKziW0IPhs2RjEe7HSfuF9DUkP7H?usp=sharing

I’ve been running an old version of DESeq2 (v1.14) on my laptop for the last couple of years. I have analysed a data set with it following the default practice (shrinkage “on”, ie betaPrior=TRUE) in the old vignette. I also rank with shrinkage off (ie betaPrior = FALSE). Following the old vignette, the code for both cases looked like this:

DESeq2test1.14.R:

packageVersion("DESeq2")
# [1] ‘1.14.1’
deseqds<-DESeqDataSetFromMatrix(countData=test_counts, colData=test_conditions, design= ~ condition)
dds_no_shrinkage <- DESeq(deseqds, betaPrior = FALSE)
results_no_shrinkage <- as.data.frame(results(dds_no_shrinkage, contrast = c("condition","treatment","control")))
dds_shrinkage <- DESeq(deseqds)
results_shrinkage <- as.data.frame(results(dds_shrinkage, contrast = c("condition","treatment","control")))

The results are in “deseq1.14resultsnoshrinkage.txt” and “deseq1.14results_shrinkage.txt"

I then installed a new version of DESeq2 (v.1.22) inside a new docker (rocker) image, and re-analysed the same dataset with the same conditions matrix. For the new version, I switched ‘on’ shrinkage by running the lfcShrink() following the current vignette and using the ’normal’ shrinkage method. I also used the unshrunk results (ie without the lfcShrink function running). This code looked like this:

DESeq2test1.22.R:

packageVersion("DESeq2")
# [1] ‘1.22.1’
deseqds<-DESeqDataSetFromMatrix(countData=test_counts, colData=test_conditions, design= ~ condition)
dds_before_shinkage <- DESeq(deseqds)
results_no_shrinkage <- as.data.frame(results(dds_before_shrinkage, contrast = c("condition","treatment","control"))
results_shrinkage <- lfcShrink(dds_before_shinkage, coef="condition_treatment_vs_control",type="normal")
results_shrinkage_df <- as.data.frame(results_shrinkage)

The results are in “deseq1.22resultsnoshrinkage.txt” and “deseq1.22results_shrinkage.txt"

I was then able to make comparisons between the results of DESeq2 versions, with and without shrinkage. Here is what I saw:

No shrinkage: deseq2 v1.14 without shrinkage VS v.21 WITHOUT shrinkage. The ranking and padj’s of the genes from deseq2 v1.14 are identical to the genes from deseq2 v1.22. See picture “noshrinkagedeseq1.14deseq1.22rank_comparison.jpg"

With Shrinkage: deseq2 v1.14 with shrinkage VS v1.21 WITH shrinkage. The ranking and padj’s of the genes from deseq v1.14 are significantly changed: in my test dataset for deseq2 v1.22 there are 75 genes with padj < 0.1, compared to 14 genes with padj < 0.1 from deseq2 v 1.14. When comparing the ranks of the top-hit genes there doesn’t appear to be a pattern as to which genes are omitted and which are kept between the two versions. See picture “shrinkagedeseq1.14deseq1.22hitcomparison.jpg”.

So the questions I have are:

Question1. The old default method (with shrinkage) appears to produce far fewer hit genes and a qualitatively different result compared to the new method with shrinkage. What should I make of this? People have made decisions about validation, emphases on papers etc based on these results over the years. If the list of hits has increased like this, then possibly other decisions could have been taken. Have others noticed this kind of behaviour, and what is their response?

Question2. The new lfcShrink method (I’m running v1.22) appears to change the LFC’s without changing the pvalues / padj values produced by DESeq2. This is in contrast to the old DESeq(… betaPrior=FALSE/TRUE) method of switching off shrinkage, which appears to change BOTH LFCs and p-values. Is this expected?

I’m happy to provide/upload any more detail that I’ve inadvertently missed.

Thanks,

Vivek

deseq2 shrinkage • 1.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

hi Vivek,

The difference between 1.14 and versions after and including 1.16 is that previously results() computed the Wald test using the shrunken coefficient, and after it computes the Wald test with the MLE coefficient, see here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#methods-changes-since-the-2014-deseq2-paper

lfcShrink() computes a shrunken coefficient but does not change the test statistic or p-values, so these are the standard test for the maximum likelihood estimate. This is described in the help page so yes it is expected.

I think that the original Normal prior was too aggressive at shrinking, and we have shown this in detail in the apeglm paper and it can be seen in the Mixology paper. This makes the Wald test based on the Normal-prior shrunken coefficient probably too conservative with respect to the MLE or compared to the alternative estimators we now recommend.

ADD COMMENT
0
Entering edit mode

Hi Michael, Thanks, this is clear! Vivek

ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6