I've received a couple questions from users about why I moved the shrinkage of log2 fold changes from the DESeq() to the lfcShrink() function. Here's my answer I give in the vignette:
"In version 1.16, the log2 fold change shrinkage is no longer default for the DESeq and nbinomWaldTest functions, by setting the defaults of these to betaPrior=FALSE
, and by introducing a separate function lfcShrink, which performs log2 fold change shrinkage for visualization and ranking of genes. While for the majority of bulk RNA-seq experiments, the LFC shrinkage did not affect statistical testing, DESeq2 has become used as an inference engine by a wider community, and certain sequencing datasets show better performance with the testing separated from the use of the LFC prior. Also, the separation of LFC shrinkage to a separate function lfcShrink allows for easier methods development of alternative effect size estimators."
As shown in the new vignette, the following code produces moderated log2 fold changes:
dds <- DESeq(dds) resultsNames(dds) res <- lfcShrink(dds, coef=2) plotMA(res)
Note: If you have version 1.16 (no longer release), you need to include res=res
res <- results(dds) res <- lfcShrink(dds, coef=2, res=res)
The last reason given above was a big motivation, we have some nice new estimators with better performance than the one we proposed in the 2014 paper, and this new function will allow us to have finer control of these without cluttering the DESeq() function arguments.
The 2014 shrinkage estimator for fold change worked well for most bulk RNA-seq datasets, but not for all of the kinds of datasets in which DESeq2 is being used. In particular, one kind of dataset where combining shrinkage with testing was not good were those in which there were singular very large fold changes (e.g. abs(LFC) > 6), despite all of the other log2 fold changes being very close to 0. Here the shrinkage was too great. This case looks much better with our new estimators. I think the 2014 shrinkage estimator applied to zero inflated count data, without any adjustment of the estimator to account for the zero component, would likely produce too much shrinkage as well.
Thanks for the clarification re: new suggested workflow. I'm unsure as to why there's a difference between calling
lfcshrink()
usingcoef=2
orcontrast
for an A vs B comparison. I would have thought that bothres_A
andres_B
below would return exactly the samelog2FoldChange
. The difference is very small but I'm wondering why is this?Here's a post where I explain the difference. I wanted to provide 'contrast' to allow users to recreate previous LFCs exactly, although moving forward, we will have more features for the 'coef' argument:
A: DESeq2 lfcShrink() usage of coef vs. usage of contrast