Hi everyone,
I noticed that applying the newly recommended apeglm
method for logfoldchange-shrinkage when using the DESeq2 package resulted in a very different MA-plot in comparison to setting the lfcShrink
method to normal
; it seems to shrink the logfoldchanges a lot more for most genes, possibly yielding a better signal/noise ratio. Are differences this dramatic to be expected when changing the lfcShrink method?
Thank you!
Benjamin
Shrinkage method set to normal
:
Shrinkage method set to apeglm
Thanks for your reply, Michael!
I'm comparing 8 versus 5 biological replicates; the data were generated using Lexogen's 3' Quantseq kit, which generates just one fragment per transcript close to the 3' end.
Please find below the MA-plot using
ashr
-shrunken LFCs.lfcShrink
withashr
gave this warning:Due to absence of package REBayes, switching to EM algorithm
(I couldn't installREBayes
because installation of the dependent packageRmosek
failed).I'm impressed by the increased signal/noise of
apeglm
, but it seems unusual thatapeglm
shrinkage yields many genes with apadj<0.1
butabs(LFC)<0.01
(please see volcano plot below)?Shrinkage method set to
ashr
Volcano plot with shrinkage method set to
apeglm
That's a good point. The reason for this is that the padj comes from null hypothesis testing then BH correction, while the shrinkage comes from a Bayesian procedure to get a reliable effect size, and the shrinkage procedure tends to be more aggressive at shrinking potential noise than the testing procedure. One way to think about it is that 1 out of 10 of the FDR set could have LFC=0. The shrinkage procedure will tend to put genes that are compatible with LFC=0 to 0, and it's more aggressive at this than the padj generating procedure. If you want something like a padj but that is in agreement with the shrunken estimate, we have followed the Stephens 2016 paper and added the capability to compute s-values and attach them to the results. You just need to set svalue=TRUE when you run lfcShrink, and then you will have an aggregate posterior probability to work with. Instead of bounding the rate of LFC=0, it bounds the rate of false signs (e.g. LFC is estimated as positive, but it's truly negative). I tend to use lower thresholds for svalue than for padj, so I might consider a 0.01 or a 0.005 svalue cutoff (the maximum s-value being ~0.5 corresponding to just guessing the sign of the LFC).
Thanks Michael, I'll read up on and try out the s-values!
I also noticed a similar behavior of apeglm in my own RNAseq data set (10 samples), and I was wondering if in this case it is advised to use the "normal" method for shrinking the values instead of the newer apeglm. Any advise would be greatly appreciated.
This means the counts for nearly all genes are compatible with LFC=0. Likely small sample size? If you want unshrunken LFC just go with the output of results().