Question

how to integrate multiple samples from ADT-seq (CITE-seq) with fastMNN

0

Entering edit mode

sorjuelal • 0

@sorjuelal-18470

Last seen 4.9 years ago

Hi, I want to integrate RNA with protein abundance, and I have multiple REAP-seq samples. My approach was to fastMNN the RNA data, and get clusters with the corrected values. Then I thought I could also MNN correct the ADT data, and then match both clusterings. Or maybe do "subclustering" as described in the OSCA book. My question is if I can actually integrate the ADT samples with fastMNN, what I tried so far is:

#d is a sce with normalized ADT in the altExp()

altExp(d) <- multiBatchNorm(altExp(d), batch = batch)
mnnadt.out <- fastMNN(altExp(d), batch = batch, d = 37, cos.norm = FALSE,
                   BSPARAM=BiocSingular::RandomParam(deferred=TRUE),
                   BPPARAM=MulticoreParam(3))

but this does PCA under the hood right? but since I only have 37 markers, I don't need to reduce dimensionality. While I'm writing this I realize I could put d=NULL as in buildSNNGraph?

Thank you, Stephany

fastMNN CITE-seq ADT-seq batchelor REAP-seq • 1.8k views

ADD COMMENT • link updated 4.9 years ago by Aaron Lun ★ 28k • written 4.9 years ago by sorjuelal • 0

score 1 · Answer 1 · 2020-03-28

That's correct, you wouldn't need dimensionality reduction. Unfortunately, fastMNN was not designed with that use-case in mind, so it does the PCA regardless. If you don't want that, you could just use reducedMNN directly, but this is a bit of a pain because you need to remember to transpose the matrices (this function expects low-dimensional inputs where the rows are the cells). If you find this effort intolerable, consider making an issue at https://github.com/LTLA/batchelor and I will get around to adding a d=NA mode to fastMNN.

Now, an aside. Whenever we're searching for MNNs in low dimensions, another question arises; namely, is the batch effect vector orthogonal to the biology? If we had high dimensions, we could pretend that this was a reasonable assumption because a random batch effect vector is unlikely to align with biological differences. As the number of dimensions drop, this assumption becomes sketchier and MNNs are more likely to be incorrectly found between different cell types. I don't think the situation is any worse for ADT data compared to PCs, though this depends much on how many markers you have and how correlated they are. So, something worth keeping in mind; look at the variance discarded and check diagnostics, etc.

And yes, you can set d=NA in buildSNNGraph.