Entering edit mode
I am retrieving the normalized counts of reads in peaks (scores) from DiffBind, as explained in this Bioconductor post :
dba.peakset(dbObj_norm, bRetrieve=TRUE ) %>% as.data.frame() %>% head()
seqnames start end width strand Samp1 Samp2 Samp3 Samp4
1 chr1 905219 905619 401 * 30.96286 26.57474 43.41694 39.82934
2 chr1 959509 959909 401 * 46.86271 36.36544 45.02497 45.13992
3 chr1 966465 966865 401 * 66.94673 70.63287 36.98480 31.86347
4 chr1 1062705 1063105 401 * 51.88371 44.75746 41.80890 37.17405
5 chr1 1231475 1231875 401 * 36.82070 37.06477 28.94462 45.13992
6 chr1 1890661 1891061 401 * 65.27306 65.73752 30.55266 31.86347
For some reason, the results are very different from the values used for plotting by dba.plotHeatmap
z <- dba.plotHeatmap(dbObj_norm, correlations = F, score = DBA_SCORE_NORMALIZED, maxSites = 10000)
> z %>% as.data.frame() %>% arrange(seqnames, start, end) %>% head()
seqnames start end width strand Samp4 Samp3 Samp1 Samp2
1 chr1 905219 905619 401 * 5.315760 5.440186 4.952467 4.731984
2 chr1 959509 959909 401 * 5.496332 5.492653 5.550368 5.184496
3 chr1 966465 966865 401 * 4.993831 5.208860 6.064942 6.142268
4 chr1 1062705 1063105 401 * 5.216224 5.385738 5.697210 5.484056
5 chr1 1231475 1231875 401 * 5.496332 4.855224 5.202445 5.211977
6 chr1 1890661 1891061 401 * 4.993831 4.933226 6.028416 6.03864
-------
seqinfo: 23 sequences from an unspecified genome; no seqlengths
The order of the columns has changed due to clustering of course, but my point is that the values themselves are different by an order of magnitude. What is the cause of the difference? Are not the values of normalized scores (as in the first code block) the correct ones to be used for heatmap plotting?
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_IL.UTF-8 LC_NUMERIC=C LC_TIME=en_IL.UTF-8 LC_COLLATE=en_IL.UTF-8 LC_MONETARY=en_IL.UTF-8
[6] LC_MESSAGES=en_IL.UTF-8 LC_PAPER=en_IL.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IL.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Jerusalem
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-3 dplyr_1.1.2 pheatmap_1.0.12 DiffBind_3.10.1 rgl_1.2.1
[6] limma_3.56.2 DESeq2_1.40.1 SummarizedExperiment_1.30.2 Biobase_2.60.0 MatrixGenerics_1.12.2
[11] matrixStats_1.0.0 GenomicRanges_1.52.0 GenomeInfoDb_1.36.0 IRanges_2.34.0 S4Vectors_0.38.1
[16] BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 deldir_1.0-9 rlang_1.1.1 magrittr_2.0.3 compiler_4.3.0 png_0.1-8
[7] vctrs_0.6.3 stringr_1.5.0 pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1 XVector_0.40.0
[13] caTools_1.18.2 utf8_1.2.3 Rsamtools_2.16.0 rmarkdown_2.22 xfun_0.39 cachem_1.0.8
[19] zlibbioc_1.46.0 jsonlite_1.8.5 DelayedArray_0.26.3 BiocParallel_1.34.2 jpeg_0.1-10 irlba_2.3.5.1
[25] parallel_4.3.0 R6_2.5.1 bslib_0.5.0 stringi_1.7.12 SQUAREM_2021.1 rtracklayer_1.60.0
[31] jquerylib_0.1.4 numDeriv_2016.8-1.1 Rcpp_1.0.10 knitr_1.43 base64enc_0.1-3 Matrix_1.5-4.1
[37] tidyselect_1.2.0 rstudioapi_0.14 yaml_2.3.7 gplots_3.1.3 codetools_0.2-19 hwriter_1.3.2.1
[43] lattice_0.21-8 tibble_3.2.1 plyr_1.8.8 withr_2.5.0 ShortRead_1.58.0 evaluate_0.21
[49] coda_0.19-4 Biostrings_2.68.1 pillar_1.9.0 KernSmooth_2.23-21 generics_0.1.3 invgamma_1.1
[55] RCurl_1.98-1.12 truncnorm_1.0-9 emdbook_1.3.12 ggplot2_3.4.2 munsell_0.5.0 scales_1.2.1
[61] ashr_2.2-54 gtools_3.9.4 glue_1.6.2 tools_4.3.0 apeglm_1.22.1 interp_1.1-4
[67] BiocIO_1.10.0 BSgenome_1.68.0 locfit_1.5-9.8 GenomicAlignments_1.36.0 systemPipeR_2.6.1 mvtnorm_1.2-2
[73] XML_3.99-0.14 grid_4.3.0 bbmle_1.0.25 amap_0.8-19 bdsmatrix_1.3-6 latticeExtra_0.6-30
[79] colorspace_2.1-0 GenomeInfoDbData_1.2.10 restfulr_0.0.15 cli_3.6.1 GreyListChIP_1.32.0 fansi_1.0.4
[85] mixsqp_0.3-48 S4Arrays_1.0.4 gtable_0.3.3 sass_0.4.6 digest_0.6.31 ggrepel_0.9.3
[91] rjson_0.2.21 htmlwidgets_1.6.2 htmltools_0.5.5 lifecycle_1.0.3 MASS_7.3-60
For heatmaps one usually uses standardized (aka Z-scored, which this almost certainly is) values which represent the deviation from the mean per gene. If you cluster expression values directly, even on log2 scale, then your heatmap will simply cluster by expression value, since expression level differences between genes are usually much larger than differential expression between samples.
That makes sense to use standardized values for heatmaps.
In this specific example, the values that are returned from dba.plotHeatmap are not the z-score values. It can be seen from the fact that they are all positive. It cannot be the case that all values are higher than the mean of these very values.
I am still interested to know what are the values that dba.plotHeatmap returns.