I have snRNA-seq data from multiple samples and conditions. I used the MT genes as the control features to determine the ambient contamination and performed a full analysis of the data, which included MNN integration. I now want to clean up the expression matrix to aid manual cell type annotation. I was going to use the controlAmbience function, however this returns a count matrix that wouldn't reflect the transformations introduced by the MNN integration. I essentially would have an expression matrix generated by the integration, and an expression matrix generated by the ambient correction. In this pretty common scenario, containing multiple samples and conditions, how should one proceed? My thoughts were:
Ignore the apparent disparity - use the MNN integrated matrix for the dimensionality reduction/clustering and use the ambient corrected matrix for visualization of gene expression and interpretation.
Analyse each sample separately and produce an ambient corrected count matrix at the end. Use the corrected matrices as input to the data integration stage and proceed with the usual downstream analyses (e.g. dimensionality reduction, clustering, marker detection)
I'm leaning more toward the second option, but it requires some extra processing and I'm unsure whether I may have missed anything in my understanding of the application of the controlAmbience function.