strange results with edgeR::goodTuring
0
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
Dear Francois, Thanks to new C code from Aaron Lun, I have now committed a fixed version of goodTuring() to edgeR on the BioC devel repository. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 28 Aug 2012, Francois Pepin wrote: > Thanks for checking it out. > > I'll see if I can find the bug or just work around it. > > I'm not actually using edgeR in this case other than this function. I > was only looking for a an existing implementation of the Good-Turing > method and found it in edgeR. > > Fran?ois > > On Aug 28, 2012, at 3:32 , Gordon K Smyth wrote: > >> Hi Francois, >> >> Well, looks like your data example has exposed a bug in my R >> implementation of the Good-Turing algorithm. I just ran the original C >> code for the algorithm on your data, and it gives the following output: >> >> 0 0 >> 312 0.0001363 >> 14491 0.006316 >> 16401 0.007149 >> 65124 0.02839 >> 129797 0.05657 >> 323321 0.1409 >> 366051 0.1595 >> 368599 0.1607 >> 405261 0.1766 >> 604962 0.2637 >> >> I'll have to think about what to do about this. I don't really have time >> to track down the bug. We could bring C code into edgeR instead, but the >> original C code would need some porting. The R code gives identical >> results to the C for longer vectors with a more typical pile-up of >> frequencies. >> >> I wonder what you mean when you say you want to estimate what kind of >> pseudo counts to use. In edgeR terminology, the pseudo counts are >> computed internally, and the user doesn't get to choose them. >> >> Best wishes >> Gordon >> >>> Date: Mon, 27 Aug 2012 12:00:19 -0700 >>> From: "Francois Pepin" <francois.pepin at="" sequentainc.com=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] strange results with edgeR::goodTuring >>> >>> Hi everyone, >>> >>> I'm trying to use the goodTuring function in edgeR to estimate what kind >>> of pseudocounts to use and I'm getting strange results with small number >>> of categories: >>> >>> x<-c(312,14491,16401,65124,129797,323321,366051,368599,405261,604962) >>> y<- goodTuring(x) >>> y >>> $count >>> [1] 312 14491 16401 65124 129797 323321 366051 368599 405261 604962 >>> >>> $proportion >>> [1] 0 0 0 0 0 0 0 0 0 1 >>> >>> $P0 >>> [1] 0 >>> >>> $n0 >>> [1] 0 >>> >>> >>> If I'm understanding this properly, y$proportion is telling me that I >>> should expect all my counts to fall under the last category, which does >>> not make sense. I would expect something pretty close to x/sum(x) >>> instead. >>> >>> This is a bit of a toy example and I'm mostly interested in cases where >>> I have more categories but it would be nice if this could work in all >>> cases. >>> >>> sessionInfo() >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_2.6.9 limma_3.12.1 dataframe_2.5 >>> >>> >>> Thanks, >>> >>> Fran?ois ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
Category edgeR Category edgeR • 806 views
ADD COMMENT

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6