IHW multiple testing correction
1
0
Entering edit mode
@larahurban-11390
Last seen 6.1 years ago

Dear All, 

I currently try to use IHW for multiple testing correction of my association study that associates gene expression of various genes with a binary molecular signature. As covariate for adjusting for multiple testing, I take the variance of the expression of each gene across all samples, but as a result the adjusted p-values are the same as after normal Benjamini-Hochberg adjustment, and the weight does not change across strata: https://imgur.com/a/EtNSZm5.

My data does not look like the examples shown in the IHW vignette (see P-values over rank of the covariate: https://imgur.com/a/prGaWnL). However, I would still like to do the same adjustment as applied in the vignette, i.e. get a smaller p-value for those genes with high variance.

I am now not sure what the problem is: Can I not use the variance of gene expression as covariate here? Is there just no difference due to variance (what I would not expect)? Or do I do sth completely wrong? 

Here is my code, and you can find the pfins dataset here (every row is one gene): https://www.dropbox.com/s/bs8oehc17qwkbkr/pfins.tsv?dl=0

library("IHW")
pfin = read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj = adj_pvalues(ihwRes)
pfin$BH = p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))

I would appreciate any help a lot, many thanks!

Best regards,

Lara

 

IHW multiple testing correction covariate distribution • 1.5k views
ADD COMMENT
0
Entering edit mode

The second link should be https://imgur.com/a/prGaWnL - no right parenthesis.

ADD REPLY
1
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Dear Lara

Thanks for the clear posting (reproducible script with small working example, post of the plots).

Indeed it looks like your variable var is not informative on power or prior probability of the tests that generate your p-values. One can glean this from the plot of -log10(p) over rank(var); another way is looking at the stratified histograms or ECDFs (https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html#stratified-p-value-histograms ). Please see here for more: http://rpubs.com/WolfgangHuber/440682

It looks like var is not a good covariate. If this is surprising, it might be worth to dig into the (pre)processing of the data to make sure it is desirable.

 

 

ADD COMMENT
0
Entering edit mode
---
title: "Lara Urban IHW question 2018-11-18"
author: "Wolfgang Huber"
output:
  BiocStyle::html_document:
    toc_float: TRUE
---

# Setup

```{r global_options, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE)
options(error = recover)
```

```{r message = FALSE}
library("IHW")
library("ggplot2")
```

# Lara's post

```{r}
pfin <- read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj <- adj_pvalues(ihwRes)
pfin$BH <- p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))
```

# Histogram of p and Schweder-Spj&#xF8;tvoll plot

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.005, boundary = 0)
```

There seems to be a small enrichment of small p-values; we can also verify this using the Schweder-Spj&#xF8;tvoll plot, see the orange line.

```{r}
metap::schweder(pfin$p, drawline = c("ls", "bh"),
                ls.col = "red", ls.lty = 1, ls.control = list(frac = 1/3),
                bh.col = "blue")
```

Also the BH method indicates that there are a few discoveries to be made, but only with FDR of 40\% or higher.

```{r}
ggplot(dplyr::filter(pfin, BH < 1), aes(x = BH)) +
  geom_histogram(binwidth = 0.01, boundary = 0)
```

# Stratification by var

```{r}
ggplot(pfin, aes(x = var)) + geom_histogram(binwidth = 0.01)
```

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.040, boundary = 0) +
  facet_wrap(~ groups_by_filter(pfin$var, 8), nrow = 2)
```

There seems to be a weak trend that the enrichment of small p-values is strongest for small `var`.
Perhaps this could also be a normalization / data preprocessing artefact?

# Conclusion

`var` is not a strong covariate. You'll have to search for others.

# Session info

```{r sesssion, eval = TRUE}
devtools::session_info()
```

 
ADD REPLY

Login before adding your answer.

Traffic: 428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6