Sample 6 will never be picked up by any parametrization of hashedDrops(), which relies on the majority of HTOs not being present in a given cell. You'll have to use a cruder strategy, e.g., fit a two-component distribution to each HTO, call a cell positive if it belongs in the "high" mode of any HTO, and then use that to generate calls based on the combinations that you see. This approach is implemented in CiteFuse but I don't know whether they support wacky designs like this; you'll probably have to write code yourself.
FYI, you can reach deep inside the package to get DropletUtils:::.get_lower_dist, which is the k-means-based two-component fitting code used by ambientProfileBimodal(). One could use this to get present/absent calls for each HTO in each cell, and then proceed from there.
Yeah it's definitely wacky and not something I want to deal with again. I've got a set of labels from an ad-hoc approach that clusters the HTOs followed by manually assigning each cluster to a sample; not fun.
In general, for when there are fewer HTOs available than samples, labelling each sample with a unique combination of, say, 2 HTOs would be compatible with hashedDrops() via combinations, right?
Yes, that would be the plan, provided that you have more than 4 HTOs. Otherwise, doublets containing all HTOs would be indistinguishable from deeply-sequenced empty droplets.
There is a way to rewrite the hashedDrops() algorithm to overcome this restriction, at the cost of (i) assuming that all HTOs exhibit a bimodal profile and (ii) increasing errors due to variation in sequencing depth across cells.
See the documentation on the constant.ambient=TRUE option in version 1.11.20 of DropletUtils, which supports situations where the total number of HTOs is less than or equal to half the expected number per cell.
Yeah it's definitely wacky and not something I want to deal with again. I've got a set of labels from an ad-hoc approach that clusters the HTOs followed by manually assigning each cluster to a sample; not fun.
In general, for when there are fewer HTOs available than samples, labelling each sample with a unique combination of, say, 2 HTOs would be compatible with
hashedDrops()
viacombinations
, right?Yes, that would be the plan, provided that you have more than 4 HTOs. Otherwise, doublets containing all HTOs would be indistinguishable from deeply-sequenced empty droplets.
There is a way to rewrite the
hashedDrops()
algorithm to overcome this restriction, at the cost of (i) assuming that all HTOs exhibit a bimodal profile and (ii) increasing errors due to variation in sequencing depth across cells.See the documentation on the
constant.ambient=TRUE
option in version 1.11.20 of DropletUtils, which supports situations where the total number of HTOs is less than or equal to half the expected number per cell.