Question

Correcting for Batch Effects Prior to Differential Gene Expression Analysis with limma

1

Entering edit mode

adscheid3 • 0

@adscheid3-12893

Last seen 7.6 years ago

Hello! I have a question about correcting for batch effects prior to differential gene expression analysis with limma. I've read that batch effect correction functions such as ComBat should not be used prior to differential expression analysis in limma, and that batch effects should be accounted for in linear modeling instead. However, in my case a batch effect and disease effect are one in the same, so if I account for the batch effect in the linear model the differential expression analysis will not include disease influences on differential gene expression. Therefore I'd like to re-run several healthy and disease samples, use those to calculate healthy and disease gene-wise normalization factors, and multiply out by those factors to eliminate the batch effect while maintaining disease effects. Is it acceptable to do the normalization using read per million data, back calculate to raw data using library sizes for each sample, and then do differential expression using limma? Thanks, all the best!

Adam

limma batch effect rna-seq • 2.5k views

ADD COMMENT • link updated 7.6 years ago by Aaron Lun ★ 28k • written 7.6 years ago by adscheid3 • 0

score 1 · Answer 1 · 2017-06-09

To answer your specific question: no, it is not valid to perform normalization within conditions. TMM normalization is relative, so normalization factors computed for separate sets of counts are not comparable. In any case, normalization doesn't solve the problem of batch effects. Normalization only eliminates global scaling differences between libraries across all genes. You can't get rid of gene-wise differences between batches.

More generally, you probably can't analyze this data set, because the experimental design is fundamentally broken. When the batch is confounded with your conditions of interest, any difference in expression between conditions may or may not be due to the batch effect - it's mathematically impossible to distinguish between these two possibilities. (The exception is if you have multiple batches nested within each condition, in which case you could use duplicateCorrelation to account for the within-batch correlations.)