Hi!
I'm using the emptyDrops() function from DropletUtils in a possibly atypical manner, in that I'm running Alevin with relaxed parameters and post-filtering with emptyDrops() (Alevin's own empty drop detection not being all that robust).
Unfortunately the emptyDrops() function (DropletUtils v1.4.2 tested) does not do well with non integer counts as produced by Alevin and friends:
empty <- emptyDrops(counts(single_cell_experiment))
There were 36 warnings (use warnings() to see them)
You get warnings like:
1: In optimize(LOGLIK, interval = interval, maximum = TRUE) :
NA/Inf replaced by maximum positive value
2: In optimize(LOGLIK, interval = interval, maximum = TRUE) :
NA/Inf replaced by maximum positive value
3: In optimize(LOGLIK, interval = interval, maximum = TRUE) :
NA/Inf replaced by maximum positive value
... which come from the .estimate_alpha() function due to the presence of NA values, which come from running edgeR::goodTuringProportions() on non-integer values. The errors mean that you don't get a good alpha, and you don't get sensible p values/ FDRs:
> head(empty)
DataFrame with 6 rows and 5 columns
Total LogProb PValue Limited FDR
<integer> <numeric> <numeric> <logical> <numeric>
1 204 NA 1 FALSE 1
2 281 NA 1 FALSE 1
3 390 NA 1 FALSE 1
4 547 NA 1 FALSE 1
5 283 NA 1 FALSE 1
6 3098 NA 1 FALSE 0
Rounding the values seems to address the issue:
testmat <- counts(single_cell_experiment)
testmat@x <- round(testmat@x)
empty <- emptyDrops(testmat)
... and the outputs then look more sensible:
> head(empty)
DataFrame with 6 rows and 5 columns
Total LogProb PValue Limited FDR
<integer> <numeric> <numeric> <logical> <numeric>
1 195 -929.398506649979 9.99900009999e-05 TRUE 0.000112840447556976
2 271 -1290.7519266567 9.99900009999e-05 TRUE 0.000112840447556976
3 377 -1700.71801792383 9.99900009999e-05 TRUE 0.000112840447556976
4 526 -2338.8659368385 9.99900009999e-05 TRUE 0.000112840447556976
5 266 -1159.62654569938 9.99900009999e-05 TRUE 0.000112840447556976
6 3007 -15306.8305754847 9.99900009999e-05 TRUE 0.000112840447556976
Could the integer check maybe be made an integral part of the testEmptyDrops() function? Or is there a more sensible workaround?
Thanks,
Jon
Thanks for that detailed response Aaron, and for the work you've done on this more generally.
I think that integer coercion inside emptyDrops() or testEmptyDrops() would be a good immediate solution. Right now, the 'clever' bits of the empty droplet detection fail without triggering an error (just those warnings), leaving you just with the non-empty results from knee detection or providing a 'retain'. I don't think it would be obvious to many users that 1) this has happened or 2) that it's because they supplied non-integer counts.
The Bioc-devel version (1.5.6) has been updated to round values at the very start of
emptyDrops()
. In the end, I decided that rounding everything at the start was the safest thing to do, as this would guarantee it would play nice (in practice and theory) with the remaining operations. I'm open to being convinced otherwise but it had better be good.Brilliant- thank you