Performance of different flavors of limma and edgeR as the sample size goes up
2
1
Entering edit mode
Nik Tuzov ▴ 90
@nik-tuzov-8783
Last seen 11 months ago
United States

Hello:

I performed some limma analysis using simulated data. For each gene, I fitted a “plain” linear model (lm() in R), limma with no trend, limma with trend, and voom.

I expected the results of all four models to converge to the same thing (in terms of p-values) as the sample size goes up. Indeed, it is the case for the first three models. However, voom stands apart. It looks like the weights computed in voom do not become equal to each other as the sample size goes to infinity.

edgeR has a similar problem. I fitted Negative Binomial regression separately for each gene, and the results are consistent with edgeR output for a large sample. However, edgeR_Robust stands apart, just like voom.

I find this queer because, if weighing is a way to dampen outliers, then for a very large sample size the sample is essentially representative of the population and there are no outliers to speak of. Hence, the observation weights should become equal to each other. Could you explain why it doesn't happen?

Regards,

Nik

limma edgeR • 2.0k views
ADD COMMENT
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 weeks ago
Icahn School of Medicine at Mount Sinai…

The voom weights are intended to capture the magnitude of the Poisson noise, or any other heteroskedasticity in the data that is related to sequencing depth per sample, not the number of samples, so Increasing the number of samples while holding the sequencing depth per sample constant should not affect the weights very much. Regardless, as the sample size increases, I would expect voom to converge with the other methods in its estimation of p-values and log fold changes. The voom paper shows that voom and limma-trend are nearly equivalent when sample sequencing depth is consistent, and so voom is primarily necessary to account for variation in sequencing depth, which is of course a near inevitability in real data, hence the need for the method. You don't mention whether the methods resulted in similar p-values or fold changes.

As for edgeR_robust, I have less experience with this method. However, a large sample size doesn't necessarily eliminate outliers. If I recall correctly, the goal of edgeR_robust is to identify and down-weight samples that do not follow the distributional assumptions of edgeR. If there is some mechanism generating such outliers in the data, increasing the sample size will not make it disappear, and a method that successfully detects and compensates for these outliers will always perform better than the same method without outlier detection, no matter how many samples are used. The edgeR_robust paper claims that in the absence of outliers, edgeR_robust performs essentially the same as unweighted edgeR. Again, you don't mention how your comparisons performed at the level of p-values and fold changes, so I can't comment on any differences you may have seen there.

On a related note, my recent experience on real data has been that limma-voom and edgeR's quasi-likelihood method perform almost identically for most genes. When I plot their p-values against each other on a log-log plot, they cluster tightly around the identity line. The LRT (and DESeq2's Wald test), in contrast, is way too liberal with most genes: https://mneme.homenet.org/~ryan/resume/examples/Salomon/globin/pval-comparisons.pdf

ADD COMMENT
0
Entering edit mode

Thanks for replying. I mentioned that I looked at p-values. Fold changes for voom and edgeR_Robust stand apart just as well.

ADD REPLY
1
Entering edit mode

Could you elaborate in your question on how limma-voom and edgeR_robust differ for p-values and logFC? You should also elaborate on how you are simulating the data. These methods were developed to take specific features of the data into account, and their agreement with other methods depends strongly on whether these features are present in your simulation.

ADD REPLY
0
Entering edit mode

I compute the average (across features) distance between p-values or ratios.

ADD REPLY
1
Entering edit mode

Presumably your simulation had some features differentially expressed and other not? So rather than seeing which methods agree most closely with each other, you can determine which methods agree most closely with the truth.

ADD REPLY
0
Entering edit mode

Ok, I'll try that.

ADD REPLY
0
Entering edit mode
Nik Tuzov ▴ 90
@nik-tuzov-8783
Last seen 11 months ago
United States

It turns out that I didn't take into account that, unlike the other two versions, voom performs CPM normalization inside the call. After I neutralized the normalization, the results of all three methods look similar for a large sample size. 

ADD COMMENT

Login before adding your answer.

Traffic: 618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6