[PWMEnrich] Enrichment p-values on human sequences
1
0
Entering edit mode
enricoferrero ▴ 660
@enricoferrero-6037
Last seen 3.1 years ago
Switzerland

Hi Robert,

Hope you are doing well!

I've noticed that when using PWMEnrich on human sequences, the p-value column in the report from groupReport() is actually "an average of log(P-values) of individual sequences" (from the vignette).

I have a few questions:

  1. Why not recompute the p-value after averaging the log(p-values)? The column is still labelled p.value, which is misleading. Also, -log10(p-value) is more commonly used than log(p-value).
  2. What is the base of the logarithm? I'm assuming it's base 10 (and not e=2.718282), but it's unclear from the vignette/documentation.
  3. How can I filter the MotifEnrichmentReport object to only show significant motifs (e.g. p-value < 0.05)?

Thank you!

Best,

pwmenrich log • 1.2k views
ADD COMMENT
2
Entering edit mode
@robert-stojnic-6136
Last seen 8.8 years ago
United Kingdom

Hi Enrico,

To answer your questions:

1. Log(p-values) are used to preserve the same sorting rules as for P-values (i.e. smaller value is better). I agree the column name is confusing, and I will change it to something more appropriate in the next release. The mean(log(p-value)) cannot easily be converted into a P-value as the individual P-values do not share the same null hypothesis (due to testing on sequences of variable length).

2. The logarithm is base e, i.e. it's the R's log function.

3. At the moment this is only possible for non-human sequences. For human you just get a ranked list of motifs - I agree this is not ideal. I would normally take the top 100 motifs for human as candidates for enrichment (i.e. the top 5% of the ranked list of motifs).

Cheers, r.

ADD COMMENT

Login before adding your answer.

Traffic: 279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6