Question

edgeR: AveLogCPM after estimateGLMCommonDisp

0

Entering edit mode

David R ▴ 90

@david-rengel-6321

Last seen 5 months ago

European Union

Hi,

I understand that, under edgeR, the dge$AveLogCPM after estimateGLMCommonDisp gives the same result as doing aveLogCPM(data, lib.size = dge$samples$lib.size*dge$samples$norm.factors). Hence my question would be: if the aveLogCPM function implements in this case the effective library size, as opposed to the default colSums(data), why then it is not implemented the dge$common.dispersion as well, instead of the 0.05 default value?

Thanks in advance for your reply,

David Rengel

rnaseq edger • 2.6k views

ADD COMMENT • link updated 8.5 years ago by Gordon Smyth 52k • written 8.5 years ago by David R ▴ 90

0

Entering edit mode

Hi Aaron,

Thanks for your reply, but I am not sure thiat is the case cause if I run dge$AveLogCPM, it gives me exactly the same result as aveLogCPM(data, lib.size = dge$samples$lib.size*dge$samples$norm.factors), but different from AveLogCPM(data, lib.size = dge$samples$lib.size*dge$samples$norm.factors, dispersion = dge$common.dispersion).

Am I missing something here?

Thanks again.

ADD REPLY • link 8.5 years ago David R ▴ 90

0

Entering edit mode

Technicalities aside (you can't run dge$AveLogCPM, it's not a function), the DGEList method for estimateGLMCommonDisp first calculates the average abundances with the default dispersion of 0.05 because it doesn't yet have the common dispersion to use in the calculations. These average abundances are only used for subsetting to improve efficiency, and then they get re-calculated with the common dispersion for output. I'm not sure how you managed to get average abundances from estimateGLMCommonDisp that don't use the common dispersion estimate.

Edit: After some thought, it seems that you're probably using an old version of edgeR, in which the S3 method for estimateGLMCommonDisp hadn't been updated to re-calculate the average log-CPM with the common dispersion. So if you update to the latest version of edgeR, all these discrepancies should be resolved.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

score 1 · Answer 1 · 2016-10-13

What you're missing with aveLogCPM() is that you're not passing the correct data object to it.

If you compute

A <- aveLogCPM(dge)

and 'dge' is a DGEList object, then the computation will use both the effective library sizes and the common dispersion. In other words, the function does exactly what you expect.

If however you compute

A <- aveLogCPM(data)

and 'data' is just a matrix of counts, how would you expect the function to know what the common dispersion is?

Your original question though relates to the behaviour of the edgeR dispersion estimation functions such as estimateGLMCommonDisp(). None of the edgeR dispersion estimation functions update the dge$aveLogCPM vector as the dispersions are updated. This is a deliberate decision. Instead we compute the dge$aveLogCPM vector right at the beginning of the pipelines, before the common dispersion has been estimated, and then keep it fixed. It would be just too confusing for this vector to keep changing.

The dge$aveLogCPM vector is used for computing trends, for MD plots and so on. By using the default dispersion=0.05, edgeR is already finessing this measure more than other packages do. In principle we could recompute the aveLogCPM component of the DGEList object every time new dispersions are estimated. We could do that for trended and tagwise dispersions as well as when a new common dispersion is estimated. But the potential gain that might be had from updating the aveLogCPM component to match each new estimated dispersion is too small to justify the extra complication and possible confusion that would arise from continually changing it.

Gordon Smyth · Answer 2 · 2016-10-13

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Actually, if a common dispersion is available in the DGEList object, then that value will be used by aveLogCPM.DGEList - check out the source code. Admittedly, the documentation could have described this better, but it seemed to be a level of technical detail that most users wouldn't care about.

ADD COMMENT • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi,

I am sorry to come back to this. I have had a look to the source codes and I see what you meant, when I go through the .DEGlist source codes, it looks OK to me. Nevertheless, I still see an issue in how the $AveLogCPM values are implemented into the dge object. I've tried to be more specific with the lines here below. I'd be very grateful if you could provide me with any help.

Best,

David

dge <- DGEList(counts=mydata, group=conds)
dge <- calcNormFactors(dge)
design=model.matrix(~0+conds)
dge <- estimateGLMCommonDisp(dge,design)

head(dge$AveLogCPM)
[1]  4.2177176  4.1126002  1.7762333  6.2465661 -0.8546385  6.1809900
head(aveLogCPM(mydata, lib.size = dge$samples$lib.size*dge$samples$norm.factors))
[1]  4.2177176  4.1126002  1.7762333  6.2465661 -0.8546385  6.1809900

head(aveLogCPM(mydata, lib.size = dge$samples$lib.size*dge$samples$norm.factors, dispersion = dge$common.dispersion))
[1]  4.1910219  4.1143934  1.7938440  6.2459912 -0.8398819  6.1842129
head(aveLogCPM(dge, lib.size = dge$samples$lib.size*dge$samples$norm.factors))
[1]  4.1910219  4.1143934  1.7938440  6.2459912 -0.8398819  6.1842129
head(aveLogCPM(dge, lib.size = dge$samples$lib.size*dge$samples$norm.factors, dispersion = dge$common.dispersion))
[1]  4.1910219  4.1143934  1.7938440  6.2459912 -0.8398819  6.1842129

ADD REPLY • link updated 8.5 years ago by Gordon Smyth 52k • written 8.5 years ago by David R ▴ 90

0

Entering edit mode

Are you sure you're running the latest version of edgeR? For me, if I mock up something like this:

mydata <- matrix(rnbinom(1000, mu=100, size=10), ncol=10)
conds <- rep(LETTERS[1:2], each=5)

... and then run your code:

library(edgeR)
dge <- DGEList(counts=mydata, group=conds)
dge <- calcNormFactors(dge)
design=model.matrix(~0+conds)
dge <- estimateGLMCommonDisp(dge,design)

... and then have a look at the aveLogCPM output:

head(dge$AveLogCPM)
# [1] 13.35735 13.24709 13.16178 13.34324 13.38505 13.44232
head(aveLogCPM(mydata, lib.size = dge$samples$lib.size*dge$samples$norm.factors))
# [1] 13.35707 13.24721 13.16139 13.34278 13.38515 13.44159
head(aveLogCPM(mydata, lib.size = dge$samples$lib.size*dge$samples$norm.factors,
               dispersion = dge$common.dispersion))
# [1] 13.35735 13.24709 13.16178 13.34324 13.38505 13.44232

So you can see that dge$AveLogCPM is, in fact, calculated using the common dispersion. Anyway, session info:

R version 3.3.1 Patched (2016-08-05 r71041)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.14.0  limma_3.28.21

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron

Thanks for your reply. I run update.packages last Thursday, hence it should be fine.. I'll have a look when I get back to my desk and my R session and I'll let you know. Thanks again.

David

ADD REPLY • link 8.5 years ago David R ▴ 90

0

Entering edit mode

biocLite() should be used for version management, see http://bioconductor.org/install/#why-biocLite.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

You were so right. I had a configuration problem with my new computer. Inherited packages from my old laptop, including edgeR, were in one place but updates were being installed somewhere else. Hence, even if I was updating packages, RStudio kept loading the old versions. I've sorted it out and now everything looks fine. I thank you and I apologize of rthe unnecessary hassle.

Best,

David

ADD REPLY • link 8.5 years ago David R ▴ 90