bug in estimateDisp() or mistake in documentation?
1
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 16 hours ago
United States
Hello, I've found either a bug in estimateDisp() or a mistake in the documentation about when it uses the classic qCML vs. the GLM CR dispersion estimates. The help page for estimateDisp() says "If there is no design matrix, it calculates the quantile conditional likelihood for each gene (tag) and then maximize it. The method is same as in the function estimateCommonDisp and estimateTagwiseDisp". However, when you call estimateDisp() without specifying a design matrix but with a DGEList object with more than one level in samples$group, it internally calculates a design matrix and gives you common/trended/tagwise dispersion estimates. Actually, even when samples$group has only one level, estimateDisp still internally creates a unit vector design matrix and outputs common/trended/tagwise dispersion estimates. So is it a bug in the code or a mistake in the documentation and estimateDisp() can't be used to get the same results as estimateCommonDisp() and estimateTagwiseDisp()? (example below) It was my understanding that for a single factor experiment (2 or more groups), the classic qCML was preferred over the GLM dispersion. However, if you have more than 2 groups and you want to calculate a oneway ANOVA or specialized contrasts, it seems you have to use the GLM approach. Is the GLM approach therefore becoming the best method for all experimental designs, even those with only a single factor? Thanks, Jenny #example - data as indicated in edgeRUsersGuide() section 4.3 > library(edgeR) Loading required package: limma > shell.exec(system.file("doc","edgeRUsersGuide.pdf",package="edgeR")) > > targets <- readTargets() > targets Lane Treatment Label Con1 1 Control Con1 Con2 2 Control Con2 Con3 2 Control Con3 Con4 4 Control Con4 DHT1 5 DHT DHT1 DHT2 6 DHT DHT2 DHT3 8 DHT DHT3 > x <- read.delim("pnas_expression.txt", row.names=1, stringsAsFactors=FALSE) > y <- DGEList(counts=x[,1:7], group=targets$Treatment, genes=data.frame(Length=x[,8])) > colnames(y) <- targets$Label > keep <- rowSums(cpm(y)>1) >= 3 > y <- y[keep,] > y$samples$lib.size <- colSums(y$counts) > y <- calcNormFactors(y) > y$samples group lib.size norm.factors Con1 Control 976847 1.0296636 Con2 Control 1154746 1.0372521 Con3 Control 1439393 1.0362662 Con4 Control 1482652 1.0378383 DHT1 DHT 1820628 0.9537095 DHT2 DHT 1831553 0.9525624 DHT3 DHT 680798 0.9583181 > > y2 <- y > > y <- estimateCommonDisp(y, verbose=TRUE) Disp = 0.02002 , BCV = 0.1415 > y <- estimateTagwiseDisp(y) > y2 <- estimateDisp(y2) Loading required package: splines > > names(y) [1] "counts" "samples" "genes" [4] "common.dispersion" "pseudo.counts" "pseudo.lib.size" [7] "AveLogCPM" "prior.n" "tagwise.dispersion" > names(y2) [1] "counts" "samples" "genes" [4] "common.dispersion" "trended.dispersion" "trend.method" [7] "AveLogCPM" "span" "tagwise.dispersion" [10] "prior.df" "prior.n" > > all.equal(y$tagwise.dispersion, y2$tagwise.dispersion) [1] "Mean relative difference: 0.1156227" > > et <- exactTest(y) > summary(decideTestsDGE(et)) [,1] -1 2096 0 12059 1 2339 > > et2 <- exactTest(y2) > summary(decideTestsDGE(et2)) [,1] -1 2078 0 12084 1 2332 > > > sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] locfit_1.5-9.1 edgeR_3.6.6 limma_3.20.8 [4] BiocInstaller_1.14.2 loaded via a namespace (and not attached): [1] grid_3.1.1 lattice_0.20-29 tools_3.1.1 Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist High Performance Biological Computing Program and The Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign NOTE NEW OFFICE LOCATION 2112 IGB 1206 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-300-6543 fax: 217-265-5066 e-mail: drnevich@illinois.edu<mailto:drnevich@illinois.edu> [[alternative HTML version deleted]]
• 2.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 44 minutes ago
WEHI, Melbourne, Australia

Hi Jenny,

The estimateDisp() function is correct. It does in fact use the qCML dispersion estimates when there is no design matrix, and it does this even though a design matrix is computed internally. However it doesn't give exactly the same estimates as estimateCommonDisp and estimateTagwiseDisp because it is a complete rewrite of the approach with somewhat different internal grid parameter settings. I agree that this last point is potentially confusing and is not adequately explained in the documentation. Our intention actually is to consolidate the functions so that estimateDisp() is always the engine even when estimateTagwiseDisp() is called.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Hi Gordon,

Thanks for the clarification. Yes, the documentation should be reworded - I did understand from it that the GLM dispersion estimates were slightly different in this function because the help page says it is "similar to" the trio of GLM dispersion functions, but the qCML method says it "is the same as". Therefore, I thought it was a deliberate distinction between the two.

Jenny

ADD REPLY
0
Entering edit mode

Sorry to post on such an old question, but I am facing the same situation. I am trying to use the classic method, and I get different results when using estimateDisp   vs   estimateCommonDisp + estimateTagwiseDisp.

Am I correct to assume that estimateDisp is the best option?

Thanks
Mau

 

ADD REPLY
0
Entering edit mode

estimateDisp() estimates the prior.df whereas estimateTagwiseDisp() just puts in a preset value (prior.df=10), so they won't give the same results.

We recommend estimateDisp() is all our documentation, see for example the edgeR workflow:

https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

ADD REPLY

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6