The output is a numeric vector with doublet scores for each cell and I was wondering how the cells are annotated. Do I have to set a hard score threshold or should I apply a x-percentile cut-off to discriminate singlets from doublets?
I would consider looking at outliers with scater::isOutlier
and type="higher"
. You'll have to choose a nmad
but that's no different to having to choose one of any arbitrary thresholds. I'd probably pick 3.
A hard score threshold would be very difficult to pick, the values have little absolute meaning. A percentile cut-off might be passable if you have an idea of the doublet rate in your experiment. However, I suspect it's not just a function of the number of cells, but rather also the relative stickiness of your population.
I suppose that the normalisation is run, but I do not know where the information is.
I don't really know what you mean. Are you looking for the normalized values? That's in the assays(sce)
under whatever name you called them. Normally if you run logNormCounts
, the sizeFactors
field is populated with library size-derived factors (assuming you didn't already have something there), and "logcounts"
assay is created. If you did your normalization via some other way, then I wouldn't know what happened.
In addition, does anyone have any experience with running the force matching argument?
I wrote it but I actually have little practical experience with it. You can have a look at my thoughts on the theoretical side here, if you haven't already. I don't put a lot of faith in the force matching; or indeed in the entire function; or indeed, in the entire class of functions in the field that rely on simulated doublets. The fundamental problems is that we don't have a good idea of the relative total RNA content of each cell, which means that we're really just hoping that our simulated doublets are good-enough proxies for the real thing.
And I'm not even talking about situations where doublets are formed by sticky cells where the adhesion induces transcriptional changes (e.g., immune synapses). Though at least there's some interesting biology there.
Thanks a lot Aaron. For the second point, I meant that I have the raw and normalised counts in
assays(sce)
saved from the Seurat object, but thesizeFactors
space is empty. In the end, doubletCells() runs fine, but I am not sure whether I should runlogNormCounts
on top to populate thesizeFactors
slot. Would it make a difference or are thesizeFactors
calculated internally by the doubletCells() function anyway?I'm pretty sure
doubletCells()
ignores the normalized values, it just uses the raw counts.