Question

Does MNN removes same average batch vector from all cells or each cell has it's own correction vector?

0

Entering edit mode

p.joshi ▴ 40

@pjoshi-22718

Last seen 2.9 years ago

Germany

Hi Aaron

I am using reducedMNN with NMF as input to perform batch correction. To better understand what is happening, I have tried going through the fastMNN theory that you have posted on github. However, I am getting confused with regard to a particular thing. At one place, it suggests that a correction vector is identified for a cell in the target batch which is an averaged correction vector from all MNN pairs of that cell with cells in the reference batch. The MNN pairs help identify local variation in subpopulation of the target batch. But then the batch vector, the component of correction vector that is actually removed, is same for all cells. Doesn't that mean that the locality of correction is disregarded and the batch effect is assumed to be same across all cells?

If that is correct that indeed the effect corrected is same across all sub-populations is same, is there a way to modify the approach to correct sub-populations individually? I tried breaking the target batch into new batches based on clusters obtained in the target batch and then running reducedMNN using them as independent batches of sample, but I am getting an overlap of the target batch (clusters), rather than different batches (clusters) integrating with separate populations in the reference. I am assuming this is happening because the subpopulation in target batch are some similarity between them and thats why they are identified as incorrect MNN pairs in my approach leading to inccorrect correction and merging of what are distinct clusters.

I am trying this idea that cluster of population have similar batch effect than overall sample and this way it would not over-correct the entire cell population. Especially when the batch effect is both technical (platform/different experiment) and biological (sex/species).

Thanks for your response!

fastmnn batchelor BatchEffect • 1.4k views

ADD COMMENT • link 3.6 years ago p.joshi ▴ 40

score 2 · Accepted Answer · 2021-09-30

But then the batch vector, the component of correction vector that is actually removed, is same for all cells. Doesn't that mean that the locality of correction is disregarded and the batch effect is assumed to be same across all cells?

The average batch effect vector is only used for the orthogonalization, a.k.a., removing all variance along the batch vector. After that's done, the actual correction itself is done using cell-specific vectors. (Well, averaged over neighboring cells, but it should be more or less local to a subpopulation.)

In theory, it should be possible to perform orthogonalization on a per-population basis, which would give even better corrections. But I haven't had time to test it out.

I tried breaking the target batch into new batches based on clusters obtained in the target batch and then running reducedMNN using them as independent batches of sample, but I am getting an overlap of the target batch (clusters), rather than different batches (clusters) integrating with separate populations in the reference.

Don't really understand what you mean here, but the MNN approach assumes that - at the very least - there is at least one common subpopulation across batches. So if you subset your various batches and pair the wrong subsets together, the algorithm will attempt to merge them (as will any correction algorithm, really) and that won't make a lot of sense.

I am trying this idea that cluster of population have similar batch effect than overall sample and this way it would not over-correct the entire cell population.

Again, I don't really understand what you mean.