Question

Volcanoplot with limma - RAW P-values or Adj.P-Values

1

Entering edit mode

tcalvo ▴ 100

@tcalvo-12466

Last seen 20 months ago

Brazil

I have noticed that limma's volcanoplot() function uses uncorrected p-values from the MArrayLM objected. My question is: why?

I've seen an old post where G. Smyth mentioned that the FDR-corrected p-values loses some info in comparison to the raw ones. Could someone elucidate this, please? Another reason pointed by the author was that the same adj.p-value may match to different p-values.

Thanks!

Thyago

volcanoplot limma fdr • 12k views

ADD COMMENT • link 7.6 years ago • updated 4.3 years ago tcalvo ▴ 100

score 6 · Accepted Answer · 2017-07-26

There's another reason to support Gordon's view. There is a fundamental difference between p-values and FDR: p-values are per-hypothesis (i.e., per-gene) properties, whereas FDR is an average across all rejected hypotheses. I.e., if you have a set of hypotheses (genes) rejected at a certain FDR $\alpha$, then the local fdr for some of these is less than $\alpha$, and for some, more than $\alpha$. The only thing you know is that the FDR overall is $\alpha$.

In general, there is no 1:1 relation between p-value and FDR. In the special case of the Benjamini-Hochberg method, such a 1:1 relation can be constructed (what's called the 'adjusted p-value'), but this assumes that the Benjamini-Hochberg method is used, with no modifications such as filtering, weighting, etc.

This assumption has seemed so natural that often it has not even been questioned (hence the popularity of the 'adjusted p-value' terminology), but in fact is not natural if there is heterogeneity between the tests, e.g., if we know that some tests have more power than others, or some have a higher prior probability of being null than others.

For these reasons, the p-value and not the adjusted p-value is the preferable quantity to use in a volcano plot.

score 5 · Accepted Answer · 2017-07-24

I'm not sure what I can tell you that I didn't already say in my earlier answer to a similar question: Volcano plot labeling troubles

You've already repeated in your question the reason why it it preferable to use p-value as the y-axis rather than FDR. (Actually I like B-statistic even better, but that's another story.) The p-values are the basic values from which FDR is computed and it is typically better to plot basic data rather than derived quantities.

Why does that not convince you? Why would you want to force points with different p-values together on the y-axis? Or are you asking for more explanation of why different p-values can lead to the same FDR? I think that has been answered separately.

Note that there is always a p-value cutoff that corresponds to any FDR cutoff, so you can easily indicate an FDR cutoff on the plot even if the y-axis is p-value. So using FDR as the y-axis has no advantage that I can think of.