edgeR cpm() function with and without log2
1
3
Entering edit mode
b.nota ▴ 370
@bnota-7379
Last seen 4.2 years ago
Netherlands

Hello,

I have a question about the cpm function from edgeR. When I use this function with log = T, I get different results from when I use it without followed by log2 transformation afterwards. What did I miss here?

Edit: Has this to do with the scaling of the prior count? If yes, what is the benefit behind this? Why is that better than just adding 0.5 read count?

> CPM <- cpm(DGE1, log = T, prior.count = 0.5, normalized.lib.sizes = F)
> tail(CPM)
                        DC07      DC08      DC09      DC10      DC11      DC12
ENSMUSG00000099399 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000095134 -5.935507 -5.935507 -5.935507 -3.647512 -5.935507 -5.935507
ENSMUSG00000095366 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096768 -4.385629 -4.434476 -5.935507 -4.378766 -5.935507 -5.935507
ENSMUSG00000099871 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096850 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507

> CPM_F <- cpm(DGE1, log = F, normalized.lib.sizes = F)
> tail(CPM_F)
                       DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000095134 0.000000 0.00000000    0 0.06345822    0    0
ENSMUSG00000095366 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096768 0.031501 0.02990833    0 0.03172911    0    0
ENSMUSG00000099871 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096850 0.000000 0.00000000    0 0.00000000    0    0

> log2CPM <- log2(CPM_F + 0.5)
> tail(log2CPM)
                         DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000095134 -1.0000000 -1.0000000   -1 -0.8276194   -1   -1
ENSMUSG00000095366 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096768 -0.9118557 -0.9161853   -1 -0.9112366   -1   -1
ENSMUSG00000099871 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096850 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gplots_3.0.1 edgeR_3.20.9 limma_3.34.9

loaded via a namespace (and not attached):
 [1] compiler_3.4.3     Rcpp_0.12.15       KernSmooth_2.23-15 splines_3.4.3     
 [5] gdata_2.18.0       grid_3.4.3         locfit_1.5-9.1     caTools_1.17.1    
 [9] bitops_1.0-6       gtools_3.5.0       lattice_0.20-35  
edger cpm • 3.1k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 9 hours ago
The city by the bay

As you may have already noticed, it is because cpm adds a prior count to the counts for each library when log=TRUE. This avoids undefined values from counts of zero, and it also stabilizes the differences in log-expression values between libraries, i.e., it squeezes the log-fold changes towards zero, especially for low counts where there would be little evidence for large fold changes anyway.

Scaling ensures that the relative effect of the added prior count is the same in each library, regardless of sequencing depth. Simply adding 0.5 to each count would effectively result in a larger value being added to counts in small libraries, once you divide by the library size to compute the CPM. This would result in spurious non-zero log-fold changes; see Differences between limma voom E values and edgeR cpm values? for details.

ADD COMMENT
0
Entering edit mode

Thanks for the link (with Gordon's answer) and explanation. So important is the fact that e.g., 1 fold change stays 1 fold change with scaling.

ADD REPLY

Login before adding your answer.

Traffic: 921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6