I've seen voom normalization plots that have an S-shaped form, where the default lowess span in limma::voom
appears to be substantially too large. For instance, appying limma::voom
to the 48 replicate WT data from Schurch et al. (2016)
Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ. (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA
yields:
I have three questions:
1) What is the probable cause of the initial upward trending curve?
2) What is the expected impact on the results of limma+voom of using this over-smoothed mean-variance relationship?
3) Is there a better (automated?) mechanism to select an appropriate span?
FWIW, here is the voom plot with a span of 0.05, which looks to be a much better fit to the data:
For the low-abundance genes, yes. For other genes, the effect is harder to predict, as the estimate of the prior degrees of freedom (a measure of the variability of the variances around the trend) gets distorted when you don't fit the trend right. In fact, even when you do fit the trend correctly, the prior d.f. estimate will probably be a bit strange; you can see that the spread is a lot tighter at low abundances due to discreteness, whereas it's generally more consistent throughout the higher abundances.