Hello! I wanted to ask how the edgeR::cpm()
function results in negative values from raw counts of 0.
I've looked at these posts from several years ago (Differences between limma voom E values and edgeR cpm values? and How does edgeR cpm function calculate log(CPM) values?). I also saw that the functions have been updated less than a year ago, so I wanted to make sure the information was up to date. I looked through the function calls, but the edgeR::cpm() function uses C code, so I couldn't exactly trace how a logcounts value of ~ -0.9 can be computed with a pseudo-bulk count of 0.
How exactly is the prior count scaled by library size?
Also tagging Leonardo Collado Torres
Thanks, Kinnary Shah
library(spatialLIBD)
spe <- spatialLIBD::fetch_data(type = "spe")
spe_pseudo <- registration_pseudobulk(spe, var_registration = "spatialLIBD", var_sample_id = "sample_id" )
# this function uses the below code to create the logcounts assay variable
# logcounts(sce_pseudo) <-
# edgeR::cpm(edgeR::calcNormFactors(sce_pseudo),
# log = TRUE,
# prior.count = 1
# )
edgeR::calcNormFactors(spe_pseudo)$samples["151671_WM",]
# group lib.size norm.factors
# 151671_WM 1 443047 1.013545
counts(spe_pseudo)["ENSG00000186827", "151671_WM"]
# 0
logcounts(spe_pseudo)["ENSG00000186827", "151671_WM"]
# -0.9157698
2^(-0.9157698) ## Undoing the log2()
# 0.530061
2^(-0.9157698) - 1 ## Undoing adding the +1 prior.count
# -0.469939 ## Different from the 0 pseudo-bulked count
genes_with_neg <- rownames(logcounts(spe_pseudo))[rowSums(logcounts(spe_pseudo) < 0) > 0]
head(genes_with_neg) # 7673 genes
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.6.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] spatialLIBD_1.19.6 SpatialExperiment_1.15.1 SingleCellExperiment_1.27.2
[4] SummarizedExperiment_1.35.3 Biobase_2.65.1 GenomicRanges_1.57.1
[7] GenomeInfoDb_1.41.2 IRanges_2.39.2 S4Vectors_0.43.2
[10] BiocGenerics_0.51.3 MatrixGenerics_1.17.0 matrixStats_1.4.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.9 shape_1.4.6.1
[5] magrittr_2.0.3 ggbeeswarm_0.7.2 magick_2.8.5 GlobalOptions_0.1.2
[9] BiocIO_1.16.0 zlibbioc_1.51.1 vctrs_0.6.5 memoise_2.0.1
[13] config_0.3.2 Rsamtools_2.22.0 paletteer_1.6.0 RCurl_1.98-1.16
[17] benchmarkme_1.0.8 htmltools_0.5.8.1 S4Arrays_1.5.10 AnnotationHub_3.13.3
[21] curl_5.2.3 BiocNeighbors_1.99.2 SparseArray_1.5.44 sass_0.4.9
[25] bslib_0.8.0 htmlwidgets_1.6.4 plotly_4.10.4 cachem_1.1.0
[29] GenomicAlignments_1.42.0 mime_0.12 lifecycle_1.0.4 iterators_1.0.14
[33] pkgconfig_2.0.3 rsvd_1.0.5 Matrix_1.7-0 R6_2.5.1
[37] fastmap_1.2.0 GenomeInfoDbData_1.2.13 shiny_1.9.1 clue_0.3-65
[41] digest_0.6.37 colorspace_2.1-1 rematch2_2.1.2 AnnotationDbi_1.67.0
[45] scater_1.34.0 irlba_2.3.5.1 ExperimentHub_2.13.1 RSQLite_2.3.7
[49] beachmat_2.21.6 filelock_1.0.3 fansi_1.0.6 httr_1.4.7
[53] abind_1.4-8 compiler_4.4.1 withr_3.0.1 bit64_4.5.2
[57] doParallel_1.0.17 attempt_0.3.1 BiocParallel_1.39.0 viridis_0.6.5
[61] DBI_1.2.3 sessioninfo_1.2.2 rappdirs_0.3.3 DelayedArray_0.31.14
[65] rjson_0.2.23 tools_4.4.1 vipor_0.4.7 beeswarm_0.4.0
[69] httpuv_1.6.15 glue_1.8.0 restfulr_0.0.15 promises_1.3.0
[73] grid_4.4.1 cluster_2.1.6 generics_0.1.3 gtable_0.3.5
[77] tidyr_1.3.1 data.table_1.16.0 ScaledMatrix_1.13.0 BiocSingular_1.21.4
[81] utf8_1.2.4 XVector_0.45.0 stringr_1.5.1 ggrepel_0.9.6
[85] BiocVersion_3.20.0 foreach_1.5.2 pillar_1.9.0 limma_3.61.12
[89] later_1.3.2 circlize_0.4.16 benchmarkmeData_1.0.4 dplyr_1.1.4
[93] BiocFileCache_2.13.0 lattice_0.22-6 rtracklayer_1.66.0 bit_4.5.0
[97] tidyselect_1.2.1 ComplexHeatmap_2.21.1 locfit_1.5-9.10 Biostrings_2.73.2
[101] scuttle_1.15.4 gridExtra_2.3 edgeR_4.3.16 statmod_1.5.0
[105] DT_0.33 stringi_1.8.4 UCSC.utils_1.1.0 lazyeval_0.2.2
[109] yaml_2.3.10 shinyWidgets_0.8.7 codetools_0.2-20 tibble_3.2.1
[113] BiocManager_1.30.25 cli_3.6.3 xtable_1.8-4 jquerylib_0.1.4
[117] munsell_0.5.1 golem_0.5.1 Rcpp_1.0.13 dbplyr_2.5.0
[121] png_0.1-8 XML_3.99-0.17 parallel_4.4.1 ggplot2_3.5.1
[125] blob_1.2.4 bitops_1.0-9 viridisLite_0.4.2 scales_1.3.0
[129] purrr_1.0.2 crayon_1.5.3 GetoptLong_1.0.5 rlang_1.1.4
[133] cowplot_1.1.3 KEGGREST_1.45.1
Thanks for posting this Kinnary. See also https://github.com/LieberInstitute/spatialLIBD/issues/106 for more details on the background.