Question

IHW and correlated covariates

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

I'm trying IHW for use with eQTL analysis, using the genomic distance.

I'm also interested in using the expression level of the gene as a covariate.

I could use a summary, such as the average normalized count for the gene, but also I'm interested in trying a number of quantiles of normalized count (e.g. 0.75, 0.9, 0.95, 0.99). Do you have any advice here, or have you thought about how IHW works when a number of the covariates are correlated?

ihw • 1.7k views

ADD COMMENT • link 8.4 years ago • updated 3.9 years ago Michael Love 43k

score 0 · Answer 1 · 2016-12-05

Hi Mike,

I am afraid this will largely be a non-answer, since right now the support for multivariate covariates is minimal: You have to manually bin covariates into higher dimensional rectangles, with the obvious repercussions in terms of the curse of dimensionality.

One thing you could do is just follow the approach used for finding the independent filtering threshold (try different covariates, see which one works best). I think this will not bias FDR control in any practically relevant way.

From a more theoretical point of view, what are the difficulties in getting higher dimensional covariates to work (and how does this relate to correlated covariates)?

1) We need to estimate the conditional distribution of the p-values given the covariates. Right now we do this non-parametrically (assuming concavity of the conditional CDFs). Unfortunately this does not really extend well to higher dimensions (again curse of dimensionality), so possibly some parametric approach will be needed. Correlation among covariates however helps, since that reduces the dimensionality of the problem.

2) Once we have the conditional CDF, we need to find the optimal weight function w*(x). One way to think about it is as a regression problem and indeed, the convex optimization framework of IHW in principle could accomodate all the usual suspects (smoothing splines, LASSO, etc.) instead of the regressogram type estimator used now. Here the usual caveats for correlated variables in regression problems apply.

One possibly way out of 1 and 2 is by assuming an additive or multiplicative structure of the weights, but we have not pursued that.

Best,

Nikos