Question

Choosing an svalue threshold from apeglm shrunken results in DESeq2

1

Entering edit mode

i.sudbery ▴ 40

@isudbery-8266

Last seen 4 months ago

European Union

As I understand it, svalues on apeglm shrunken LFCs with a lfcThreshold identify genes where I am confident the change is at least a certain size, which I think seems like a good way to go for DE analysis. I note that when I apply lfcShrink(dds, type="apeglm", lfcThreshold = 0.5), and then print a summary of the resulting DESeqResults object, a threshold of svalue<0.005 is used irrespective of what I have put in the results call. Is there a reasonwhy svalue < 0.005 is used, rather than the more traditional level of 0.05 used with p values and FDRs? Is it just that 0.05 is an arbitratry value used for single comparisons by fisher 100 years ago and doesn't make much sense for FDR calcualtions, or that 0.005 commonly provides about the same level of conservatism as an FDR of 0.05, or some other reason?

Is there a reason by thinknig should be different in picking a svalue threshold that it would be for picking a adjusted p-value threshold? Is thresholding even how svalues are supposed to be used?

apeglm DESeq2 • 2.2k views

ADD COMMENT • link updated 3.4 years ago by Michael Love 43k • written 3.4 years ago by i.sudbery ▴ 40

score 1 · Answer 1 · 2021-09-07

I don't see summary() disregarding the significance threshold alpha:

> dds <- makeExampleDESeqDataSet()
> dds <- DESeq(dds)
> res <- lfcShrink(dds, type="apeglm", lfcThreshold = 0.5, coef=2)
using 'apeglm' for LFC shrinkage. If used in published research, please cite:
    Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
    sequence count data: removing the noise and preserving large differences.
    Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
computing FSOS 'false sign or small' s-values (T=0.5)
> summary(res)

out of 998 with nonzero total read count
s-value < 0.005
LFC > 0.50 (up)    : 0, 0%
LFC < -0.50 (down) : 0, 0%

> summary(res, alpha=0.001)

out of 998 with nonzero total read count
s-value < 0.001
LFC > 0.50 (up)    : 0, 0%
LFC < -0.50 (down) : 0, 0%

"Is there a reason why svalue < 0.005 is used, rather than the more traditional level of 0.05 used with p values and FDRs?"

In the apeglm vignette we point out:

Note that p-values and FSR define different events, and are not on the same scale. An FSR of 0.5 means that the estimated sign is as bad as random guess.

The s-value was proposed by Stephens (2016), as a statistic giving the aggregate false sign rate for tests with equal or lower s-value than the one considered. We recommend using a lower threshold on s-values than typically used for adjusted p-values, for example one might be interested in sets with 0.01 or 0.005 aggregate FSR.

Basically, an FDR of 1 corresponds to all nulls, while an FSR of 0.5 corresponds to random guessing, so that motivates us looking at a lower threshold than typically applied to adjusted p-value or q-values.

There is also a plot where we show for simulated data the adjusted p-values and the s-values, and typical values of adj p = 0.05 or 0.1 are in the range of s-value of 0.005. This may differ by dataset but it motivated our suggestion for a smaller threshold.

Yes, you should threshold the s-values and expected that rate of "false sign or smaller than LFC=0.5 in absolute value" in the results table.