Question

Voom nornalization lowess span

0

Entering edit mode

Gregory Warnes ▴ 50

@gregory-warnes-2155

Last seen 8.4 years ago

United States

I've seen voom normalization plots that have an S-shaped form, where the default lowess span in limma::voom appears to be substantially too large. For instance, appying limma::voom to the 48 replicate WT data from Schurch et al. (2016)

Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ. (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA

yields:

I have three questions:

1) What is the probable cause of the initial upward trending curve?

2) What is the expected impact on the results of limma+voom of using this over-smoothed mean-variance relationship?

3) Is there a better (automated?) mechanism to select an appropriate span?

FWIW, here is the voom plot with a span of 0.05, which looks to be a much better fit to the data:

voom normalization • 2.2k views

ADD COMMENT • link updated 8.6 years ago by Gordon Smyth 51k • written 8.6 years ago by Gregory Warnes ▴ 50

score 2 · Answer 1 · 2016-04-01

That dip on the left is invariably a result of discreteness, where the variance decreases because the majority of the counts are zero (and thus, many of the log-CPMs are the same, or nearly the same after the continuity correction and division by library size). If you filter on abundance as recommended, the plot would get truncated on the left which should remove this dip. In fact, the removal of strange trends at low abundances is one of the main reasons for filtering, to ensure that they don't compromise the inferences for higher-abundance genes.

Of course, if you're explicitly interested in low-abundance genes, then filtering would not be a solution. However, if that's the case, I'd argue that voom should not be used at all; normality isn't appropriate for low, discrete counts, and it's also difficult to model the mean-variance relationship when you're only focusing on a limited covariate range at low abundance. If you really need to get inferences for low-abundance genes, then I'd suggest switching to edgeR, as it better handles the count-based nature of the data.

score 1 · Answer 2 · 2016-04-01

As Aaron has said, the voom trend always has a J-shape at low counts because of the effects. When you filter the low count genes out, this J-shape will disappear, see for example my advice to this poster:

voom for spectral counts

The voom lowess span is deliberating chosen large so that the curve will not follow artifacts like this. Voom is currently conservative for very low counts, which I'm reasonably happy with.

score 0 · Answer 3 · 2016-04-01

0

Entering edit mode

Gregory Warnes ▴ 50

@gregory-warnes-2155

Last seen 8.4 years ago

United States

Thanks Aaron.

I suppose that using the wider span would increase the estimated variability dramaticallly, which would have the side effect of forcing these low-significance genes to be non-significant.

ADD COMMENT • link 8.6 years ago Gregory Warnes ▴ 50

0

Entering edit mode

For the low-abundance genes, yes. For other genes, the effect is harder to predict, as the estimate of the prior degrees of freedom (a measure of the variability of the variances around the trend) gets distorted when you don't fit the trend right. In fact, even when you do fit the trend correctly, the prior d.f. estimate will probably be a bit strange; you can see that the spread is a lot tighter at low abundances due to discreteness, whereas it's generally more consistent throughout the higher abundances.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k