Question

lfcShrink behavior with homogeneous expression dataset

1

Entering edit mode

mat.lesche ▴ 110

@matlesche-6835

Last seen 11 months ago

Germany

Hello,

I have a small dataset with 3 replicates per condition. One condition is a gene over-expression, the other is a control. I'm used to run DESeq2 with betaPrior = True to have a comparability with the 'old' DESeq2 behavior. I can't apply this old workflow because the expression differences between my conditions it quite small and one gene, namely the over-expressed gene, would stand out and with betaPrior = T it's shrunk to a very small log2 FoldChange. According to the manual this is to be expected which is why I used betaPrior = F and the several lfcShrink methods (normal, ashr and apeglm).

However I was wondering if I should apply a shrinkage or just use betaPrior = F? Secondly, I find the shrinkage quite drastically and not comparable to the manual (https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#alternative-shrinkage-estimators). I know you can't compare datasets but still it's good to use it as a reference. It's also explained here that imprecise fold changes are shrunk and large fold changes are kept (https://support.bioconductor.org/p/77461/). And that is valid for the data set. The DEGs have a similar log2Foldchange over the used methods.

Here is the data set with 3 replicates and the several methods. From the left to the right it is the following and I used coef in the lfcShrink function:

betaPrior = F and method = normal, betaPrior = F, betaPrior = F and method = ashr, betaPrior = F and method = apeglm

https://ibb.co/cwERg8

If I use two replicates, it's even more visible, especially for apeglm and I wonder what's happening here. Especially most of the not DE are reduced to log2Foldchange of < 0.001 and even some DEG have a log2Foldchange < 0.001

https://ibb.co/fjz6ET

Thanks

Mathias

deseq2 lfcshrink betaprior • 2.5k views

ADD COMMENT • link updated 6.8 years ago by Michael Love 43k • written 6.8 years ago by mat.lesche ▴ 110

score 3 · Answer 1 · 2018-07-31

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 days ago

United States

dear Mathias,

Thanks for posting these images. What you observe is consistent with what we see in testing on the benchmarking data and on simulation data. If you just compare method="normal" to method="apeglm" or "ashr", the differences you are likely to see is that normal will shrink large effects even if they have high precision (so shrinking too much) and allow small effects to float around 0, while apeglm/ashr will not shrink the precise, large effects much at all and the small effects which are indistinguishable from 0 will be shrunk to 0. By indistinguishable I mean that the SE(beta) >> |beta|. This kind of estimator is very good at ranking genes and it also has low error compared to the true LFC (which we know in the benchmarking datasets and the simulation datasets). Also all of these estimators converge to MLE as the sample size gets larger. It's up to you which you want to use for plots. For ranking genes by effect size, I'd recommend apeglm or ashr, as these perform the best in our hands.

One more note: the DEG according to the Wald test can be shrunk a lot by apeglm or ashr, they are different approaches statistically, and you can have a very small p-value for small effect size which nevertheless is not equal to 0. If you want to have more consistent error-bounded sets (FDR/FSR sets) with the shrunken effect sizes, you can use svalue=TRUE when you call lfcShrink(). Then the error-bounded set is consistent with the shrunken effect size, because the same posterior is being used for both.

ADD COMMENT • link 6.8 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks, as many times before, for the feedback. After reading your tutorials and comments on threads, I understand the shrinkage. But what is a bit of a concern is providing a ma plot to a biologist. In the past the would a similar plots compared to picture 1 and 2 in https://ibb.co/fjz6ET (betaPrior = F, method = normal and betaPrior = F) and now one delivers the plot for method = apeglm which basically only shows the DEGs and everything else is around 0 and not even visible anymore.

For example here are some genes which are significant, padj < 0.1, in the two replicates per condition scenario

Gene

log2 apeglm

log2 betaPrior = F

Cond A

Cond B

Stat2

0.0013680983

0.78

894.8

1536.7

Rbm43

0.0010062328

0.91

181.3

341.4

Apobec1

0.0009898028

0.8

230.6

402.7

Zcchc24

0.0011278656

0.68

332.6

533.5

That's hard to explain to a biologist who looks at the mean counts per condition and wonders why the log2 foldchange is now almost 0. How would one approach this with a good explanation?

ADD REPLY • link 6.8 years ago mat.lesche ▴ 110

1

Entering edit mode

I forgot to mention a feature which might be useful for you. If I simulate a majority of null where LFC=0 with a few spiked in DE genes, you can see the behavior you are talking about, where the precise LFCs are preserved and many LFCs compatible with zero are shrunk close to zero. This is the estimator which gives very good rankings and MSE compared to truth.

If however, you want to use the Cauchy prior in apeglm but not have it adapt to the scale of the data, you can set apeAdapt=FALSE. This uses a non-adaptive Cauchy prior with the standard scale. This will tend to make MA plots that look like type="normal", but it will perform better than type="normal", in that precise, large effects will not be reduced. This is not the default method for apelgm, because the default is more accurate in producing FSR bounded sets which we show in the manuscript. I wouldn’t use the s-values with apeAdapt=FALSE. But this may be a useful in between for you, or using type="ashr". Here's some example code:

dds <- makeExampleDESeqDataSet(m=6,betaSD=rep(c(0,1),c(950,50))) 
dds <- DESeq(dds, quiet=TRUE) 
res <- lfcShrink(dds, coef=2, type="apeglm") 
res2 <- lfcShrink(dds, coef=2, type="apeglm", apeAdapt=FALSE) 
par(mfrow=c(1,2)) 
plotMA(res, ylim=c(-4,4)); plotMA(res2, ylim=c(-4,4))

ADD REPLY • link 6.8 years ago Michael Love 43k

0

Entering edit mode

I guess from this table we don't know about variability of counts or the MLE. What's the LFC SE?

Also, what does ashr report?

ADD REPLY • link 6.8 years ago Michael Love 43k