Yes, you can use mnnCorrect
, or its faster and usually nicer-looking sibling fastMNN
. Some instructions on the latter are provided here. I don't see any obvious problem with using MNN in the situation you've mentioned, so you'll just have to try it and see if it works.
While we're on this topic: MNN will happily remove biological differences between samples. This is not a bug, but a feature. To give an example - the compareSingleCell workflow uses fastMNN
to merge wild-type and knock-out samples together prior to downstream analysis, i.e., clustering and annotation of clusters. If we tried to preserve biological differences between samples, the wild-type and knock-out cells would never cluster together, as they would be separated in their expression profiles by the big effect of the knock-out. This is "biologically accurate" but would defeat the purpose of doing merging in the first place, because now we need to cluster and annotate each genotype separately. Differential abundance analyses would become silly - "why yes, the abundance of the knock-out mesoderm cluster increases in the knock-out mice" - and trying to match up clusters between genotypes is not a pleasant experience in developmental settings where the clusters are not distinct.
By getting rid of the differences between samples, we can establish a common annotation that allows us to more easily compare cell types/states between samples. Once this is established, you can then go back to the original expression values to do a pseudo-bulk DE analysis (see one of the later vignettes in the workflow) to recover the differences between samples. And of course, if you don't fully trust the batch correction, you can simply cluster each sample individually. In doing so, you can take advantage of the hard work that you did in setting up the common annotation to guide your per-sample annotation - then you don't have to re-annotate everything from scratch, you only have to worry about big discrepancies from the common annotation.
I've found that people get upset when I tell them to remove biological differences between samples. But the alternative is to have, e.g., all cells from each patient clustering separately, which makes the merge useless.
Thanks a lot Aaron !!! I still confused with somethings.
?fastMNN
, I'm not going to repeat it here.fastMNN
, you'll get low-dimensional coordinates as output, so there's no need for another PCA step. Read the documentation, etc.batchelor::rescaleBatches
.)Sincere thanks, Aaron. Best wishes.