Hello,
I have a question about the cpm function from edgeR. When I use this function with log = T, I get different results from when I use it without followed by log2 transformation afterwards. What did I miss here?
Edit: Has this to do with the scaling of the prior count? If yes, what is the benefit behind this? Why is that better than just adding 0.5 read count?
> CPM <- cpm(DGE1, log = T, prior.count = 0.5, normalized.lib.sizes = F) > tail(CPM) DC07 DC08 DC09 DC10 DC11 DC12 ENSMUSG00000099399 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 ENSMUSG00000095134 -5.935507 -5.935507 -5.935507 -3.647512 -5.935507 -5.935507 ENSMUSG00000095366 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 ENSMUSG00000096768 -4.385629 -4.434476 -5.935507 -4.378766 -5.935507 -5.935507 ENSMUSG00000099871 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 ENSMUSG00000096850 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 > CPM_F <- cpm(DGE1, log = F, normalized.lib.sizes = F) > tail(CPM_F) DC07 DC08 DC09 DC10 DC11 DC12 ENSMUSG00000099399 0.000000 0.00000000 0 0.00000000 0 0 ENSMUSG00000095134 0.000000 0.00000000 0 0.06345822 0 0 ENSMUSG00000095366 0.000000 0.00000000 0 0.00000000 0 0 ENSMUSG00000096768 0.031501 0.02990833 0 0.03172911 0 0 ENSMUSG00000099871 0.000000 0.00000000 0 0.00000000 0 0 ENSMUSG00000096850 0.000000 0.00000000 0 0.00000000 0 0 > log2CPM <- log2(CPM_F + 0.5) > tail(log2CPM) DC07 DC08 DC09 DC10 DC11 DC12 ENSMUSG00000099399 -1.0000000 -1.0000000 -1 -1.0000000 -1 -1 ENSMUSG00000095134 -1.0000000 -1.0000000 -1 -0.8276194 -1 -1 ENSMUSG00000095366 -1.0000000 -1.0000000 -1 -1.0000000 -1 -1 ENSMUSG00000096768 -0.9118557 -0.9161853 -1 -0.9112366 -1 -1 ENSMUSG00000099871 -1.0000000 -1.0000000 -1 -1.0000000 -1 -1 ENSMUSG00000096850 -1.0000000 -1.0000000 -1 -1.0000000 -1 -1 R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_3.0.1 edgeR_3.20.9 limma_3.34.9 loaded via a namespace (and not attached): [1] compiler_3.4.3 Rcpp_0.12.15 KernSmooth_2.23-15 splines_3.4.3 [5] gdata_2.18.0 grid_3.4.3 locfit_1.5-9.1 caTools_1.17.1 [9] bitops_1.0-6 gtools_3.5.0 lattice_0.20-35
Thanks for the link (with Gordon's answer) and explanation. So important is the fact that e.g., 1 fold change stays 1 fold change with scaling.