Removing batch effect with limma::removeBatchEffect() actually exacerbates the effect
2
2
Entering edit mode
drowsygoat ▴ 30
@lechkaczmarczyk-14172
Last seen 3.6 years ago
Poland

![enter image description here][1]Hello,

I am attempting to remove batch effects from my data using limma::removeBatchEffect(). I have two batches of samples, and there are four conditions. In the figures below batches are color-coded. I'm wondering why the batch effect seems stronger after applying the limma::removeBatchEffect().

The functions were running with default parameters, as follows:

      vst <- vst(dds)
      plotPCA(vst, "Sac")
      assay(vst) <- limma::removeBatchEffect(assay(vst), vst$Sac)
      plotPCA(vst, "Sac")

Before correction: Before Limma batch correction After correction: After Limma batch correction

limma deseq2 • 16k views
ADD COMMENT
5
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

Two points.

First, your PCA plot does not suggest a substantial batch effect, so I wonder whether you need to worry about it.

Second, when you run removeBatchEffect you need to set the design argument so that the function knows what the four treatment conditions are. The batches are unbalanced with respect to conditions, and we only want to remove the batch effect within each condition level. For example:

design0 <- model.matrix(~condition)
assay(vst) <- removeBatchEffect(assay(vst), vst$Sac, design=design0)

Without setting the design argument, the effect you have seen is to be expected.

SVA and RUV don't seem to me to be appropriate here, because they are intended to discover the batch factor whereas you already know what it is. If you do use those algorithms, then you will have the same issue that you have with removeBatchEffect. When you do the actual batch correction, the batch correction algorithm will need to know the treatment conditions as well as the batch factor or surrogate variables.

ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 2 days ago
United States

What has happened when you run the removeBatchEffect function is to remove shifts in the group means associated with the grouping factor you provide, per row of the matrix. It seems like the shift is not shared across the conditions. Are these really just two batches, or where the condition samples divided further?

ADD COMMENT
0
Entering edit mode

Many thanks for your response Michael, I appreciate that. This was RNAseq of mouse brain regions- and cell-specific RNA immunoprecipitations. Groups denote the days the mice were sacrificed. Conditions were not divided further.

Since the outliers were overlapping with the time-points in which the specimens were sacrificed, I thought it's a sound approach to treat it as a batch effect (importantly, the mice sacked later were also born later, so it should not be related to age).

Of course it may be i) a coincidence or ii) tissue preparation (experimental) artifact (e.g. lack or reproducibility in brain region dissection). If I understand correctly, the shift between those samples is inconsistent and therefore does not resemble a typical batch effect, hence the observed output of the removeBatchEffect function. Would it be a good use of time to try other tools to handle this?

If this is not a batch effect, I would hesitate between i) using the samples as they are for comparisons or ii) using only "red" ones, and tossing the "green" batch.

ADD REPLY
0
Entering edit mode

I might try SVA or RUV.

Another thing I would do is find a batch-y gene (via an LRT removing the batch variable) and look at plotCounts() for these genes to see if the batch effect is consistent. The important thing for DE analysis is what happens at the gene level, while the PCA is just a QC plot, to give an overview of the variation.

ADD REPLY

Login before adding your answer.

Traffic: 850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6