I don't see summary()
disregarding the significance threshold alpha
:
> dds <- makeExampleDESeqDataSet()
> dds <- DESeq(dds)
> res <- lfcShrink(dds, type="apeglm", lfcThreshold = 0.5, coef=2)
using 'apeglm' for LFC shrinkage. If used in published research, please cite:
Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
sequence count data: removing the noise and preserving large differences.
Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
computing FSOS 'false sign or small' s-values (T=0.5)
> summary(res)
out of 998 with nonzero total read count
s-value < 0.005
LFC > 0.50 (up) : 0, 0%
LFC < -0.50 (down) : 0, 0%
> summary(res, alpha=0.001)
out of 998 with nonzero total read count
s-value < 0.001
LFC > 0.50 (up) : 0, 0%
LFC < -0.50 (down) : 0, 0%
"Is there a reason why svalue < 0.005 is used, rather than the more traditional level of 0.05 used with p values and FDRs?"
In the apeglm vignette we point out:
Note that p-values and FSR define different events, and are not on the same scale. An FSR of 0.5 means that the estimated sign is as bad as random guess.
The s-value was proposed by Stephens (2016), as a statistic giving the aggregate false sign rate for tests with equal or lower s-value than the one considered. We recommend using a lower threshold on s-values than typically used for adjusted p-values, for example one might be interested in sets with 0.01 or 0.005 aggregate FSR.
Basically, an FDR of 1 corresponds to all nulls, while an FSR of 0.5 corresponds to random guessing, so that motivates us looking at a lower threshold than typically applied to adjusted p-value or q-values.
There is also a plot where we show for simulated data the adjusted p-values and the s-values, and typical values of adj p = 0.05 or 0.1 are in the range of s-value of 0.005. This may differ by dataset but it motivated our suggestion for a smaller threshold.
Yes, you should threshold the s-values and expected that rate of "false sign or smaller than LFC=0.5 in absolute value" in the results table.
Sorry, my bad WRT to the alpha level in
summary
. I was setting alpha inresults
, which I then passed tolfcShrink
- I can see now that I can, and need to, set it directly in summary if I convert to svalues.Its interesting what you say about svalues vs FDRs. Both an FDR of 0.5 and an svalue of 0.5 mean you are probably making the incorrect call in about 50% of cases (assuming real effects are symmetric about 0, which I guess is an assumption of the whole method), but I hadn't considered the fact that with and svalue you make the correct call in 50% of cases just by chance, but thats not when you use adjusted p-values. Presumably the same reasoning doesn't apply when using an lfcThreshold - you don't make a "false sign or small" call in 50% of cases just by chance? Or have I misunderstood something?
Oh I see, so
alpha
inresults()
is specific to the independent filtering routine (Bourgon et al 2010), which is not used in thelfcShrink()
procedure.In my mind, for most datasets, it's easier to guess the sign than guess which LFC=0. So if we set the same aggregate rate for wrongly estimated sign of effect size and wrong estimated null status, then we would be more permissive for the set defined by sign of effect size.
Right, when we use
lfcThreshold
, then the FSOS mistakes by chance are higher than 50%. It would depend on the true distribution of effect sizes I think.These are great questions though, let me know if you have more.