I was playing around with the test.ambience
argument of DropletUtils::emptyDrops()
and was initially surprised to find that I was getting a different number of non-empty droplets returned by the function, as illustrated below.
suppressPackageStartupMessages(library(DropletUtils))
set.seed(0)
my.counts <- DropletUtils:::simCounts()
set.seed(666)
out_FALSE <- emptyDrops(my.counts)
length(which(out_FALSE$FDR <= 0.001))
#> [1] 942
set.seed(666)
out_TRUE <- emptyDrops(my.counts, test.ambient = TRUE)
length(which(out_TRUE$FDR <= 0.001))
#> [1] 842
After more carefully reading the documentation, I understand that's because a different number of non-NA P-values are being returned
sum(!is.na(out_FALSE$PValue))
#> [1] 1300
sum(!is.na(out_TRUE$PValue))
#> [1] 10727
and therefore the FDR values are different, so at a given FDR cutoff the results may be different.
But does this mean that the test.ambient = TRUE
results should only be used for diagnostic plots of the P-value histogram and not be used for the actual analysis?
Thanks!
Aside
I think this line in the documentation is should be test.ambient=TRUE
:
If
test.ambient=FALSE
, non-NA statistics will be reported for all barcodes.
It might also need clarification, as non-NA statistics will only be reported for all barcodes with non-zero counts:
out_FALSE[is.na(out_FALSE$FDR), ]
#> DataFrame with 9800 rows and 5 columns
#> Total LogProb PValue Limited FDR
#> <integer> <numeric> <numeric> <logical> <numeric>
#> 1 2 NA NA NA NA
#> 2 9 NA NA NA NA
#> 3 20 NA NA NA NA
#> 4 20 NA NA NA NA
#> 5 1 NA NA NA NA
#> ... ... ... ... ... ...
#> 9796 10 NA NA NA NA
#> 9797 6 NA NA NA NA
#> 9798 10 NA NA NA NA
#> 9799 15 NA NA NA NA
#> 9800 4 NA NA NA NA
out_FALSE[is.na(out_TRUE$FDR), ]
#> DataFrame with 373 rows and 5 columns
#> Total LogProb PValue Limited FDR
#> <integer> <numeric> <numeric> <logical> <numeric>
#> 1 0 NA NA NA NA
#> 2 0 NA NA NA NA
#> 3 0 NA NA NA NA
#> 4 0 NA NA NA NA
#> 5 0 NA NA NA NA
#> ... ... ... ... ... ...
#> 369 0 NA NA NA NA
#> 370 0 NA NA NA NA
#> 371 0 NA NA NA NA
#> 372 0 NA NA NA NA
#> 373 0 NA NA NA NA
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Ubuntu 18.04.5 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2021-01-23
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> beachmat 2.6.4 2020-12-20 [1] Bioconductor
#> Biobase * 2.50.0 2020-10-27 [1] Bioconductor
#> BiocGenerics * 0.36.0 2020-10-27 [1] Bioconductor
#> BiocParallel 1.24.1 2020-11-06 [1] Bioconductor
#> bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.0)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.0)
#> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> DelayedArray 0.16.0 2020-10-27 [1] Bioconductor
#> DelayedMatrixStats 1.12.2 2021-01-12 [1] Bioconductor
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.0)
#> dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.0)
#> DropletUtils * 1.10.2 2020-12-20 [1] Bioconductor
#> edgeR 3.32.1 2021-01-14 [1] Bioconductor
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.0)
#> GenomeInfoDb * 1.26.2 2020-12-08 [1] Bioconductor
#> GenomeInfoDbData 1.2.4 2020-10-28 [1] Bioconductor
#> GenomicRanges * 1.42.0 2020-10-27 [1] Bioconductor
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.0)
#> HDF5Array 1.18.0 2020-10-27 [1] Bioconductor
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.1 2021-01-12 [1] CRAN (R 4.0.0)
#> IRanges * 2.24.1 2020-12-12 [1] Bioconductor
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.0)
#> lattice 0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> limma 3.46.0 2020-10-27 [1] Bioconductor
#> locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.0)
#> Matrix 1.3-2 2021-01-06 [4] CRAN (R 4.0.3)
#> MatrixGenerics * 1.2.0 2020-10-27 [1] Bioconductor
#> matrixStats * 0.57.0 2020-09-25 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.0)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.0)
#> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.0)
#> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.0)
#> RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.0)
#> rhdf5 2.34.0 2020-10-27 [1] Bioconductor
#> rhdf5filters 1.2.0 2020-10-27 [1] Bioconductor
#> Rhdf5lib 1.12.0 2020-10-27 [1] Bioconductor
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.0)
#> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.0)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.0)
#> S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor
#> scuttle 1.0.4 2020-12-17 [1] Bioconductor
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> SingleCellExperiment * 1.12.0 2020-10-27 [1] Bioconductor
#> sparseMatrixStats 1.2.0 2020-10-27 [1] Bioconductor
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> SummarizedExperiment * 1.20.0 2020-10-27 [1] Bioconductor
#> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.0)
#> usethis 2.0.0 2020-12-10 [1] CRAN (R 4.0.0)
#> withr 2.4.0 2021-01-16 [1] CRAN (R 4.0.0)
#> xfun 0.20 2021-01-06 [1] CRAN (R 4.0.0)
#> XVector 0.30.0 2020-10-27 [1] Bioconductor
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#> zlibbioc 1.36.0 2020-10-27 [1] Bioconductor
#>
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library