Question

How to control p-value inflation with limma

1

Entering edit mode

stephane.cauchi ▴ 10

@stephanecauchi-12521

Last seen 8.2 years ago

Dear all,

Using the limma package I compared the mRNA expression profiles between small groups (N=5) of isogenic mice (balb/c) for different conditions (Agilent one-color microarrays). I could observe strong p-value inflation (as measured by lambda) in most comparisons despite the FDR correction. One factor known from genomewide association (GWA) studies to cause p-value inflation is population stratification, such as relatedness among individuals. Whatever the comparisons, all the mice are supposed to have the same genetic background. Therefore, I have several questions:

1) Do I need to adapt sepcific limma functions to take this study design into account?

2) Because one-color microarrays were used for this experiment, I have been told that within array normalization may not be feasible. Until now this has not been applied to the dataset. What do you think?

3) Do you think that additional packages such as BACON may be necessary to fix this issue?

Thank you very much for your help

limma pvalue inflation bacon • 2.6k views

ADD COMMENT • link updated 8.2 years ago by Aaron Lun ★ 28k • written 8.2 years ago by stephane.cauchi ▴ 10

score 1 · Answer 1 · 2017-03-08

The genomic inflation factor is not applicable here. GWAS analyses involve different models and different assumptions, and you can't just take a diagnostic from those analyses and expect them to be useful in limma. In particular, diagnostics based on goodness-of-fit statistics are not relevant to linear models, because any deviation of the observations from the fitted values will be modelled by an increase in the residual variance. This means that you won't be able to distinguish between an incorrect model that's missing some terms, and a correct model fitted to highly variable data. Now, to answer your specific questions:

Just because your mice have the same genetic background doesn't mean that there aren't underlying correlations. Were some of the mice raised at the same time? Are they littermates? Even if all samples were collected at the same time, were some samples processed differently? These factors may affect expression and cause the residuals for some samples to be positively correlated; this is usually anticonservative as the amount of information in the data is overstated. If you have known factors of variation in your data set, you should block on them in the design matrix or via duplicateCorrelation.
Of course you can't do within-array normalization, that's for two-colour arrays. You need to do between-array normalisation.
What issue? I'm yet to be convinced that limma is not working.

score 0 · Answer 2 · 2017-03-08

Dear Stephane,

I wouldn't not use bacon for this. bacon is really meant for association studies with sample size n>100. Also I wouldn't not use the GWAS inflation factor it is really meant for GWAS data.

Did you check the limma userguide there is a whole chapter devoted to single channel arrays: "Single-Channel Experimental Design".

If you think there might be unobserved confounding factors you could look at sva, combat, ruv or cate. These are all methods for handling unobserved confounding factors and have a R/Bioconductor implementation.

Cheers,

Maarten