Drastic change in edgeR p-value calculation?

0

Entering edit mode

Lionel Lee Brooks 3rd ▴ 10

@lionel-lee-brooks-3rd-5660

Last seen 10.2 years ago

Hello Gentlepeople, I have been using the edgeR package to identify differentially abundant tags in an RNA-seq experiment. I was happy when the edgeR package version 2.4.6 called 41 differentially abundant tags. However, now I am unhappy because the same analysis with edgeR package version 3.0.2 does not call any differentially abundant tags. I've compared the output of the two versions and I can see that both versions calculate the same fold changes - however - the p-value calculation is different. I feel that the optimal solution for me is to determine the command arguments that I can use to emulate the previous default behavior. Can anyone help me figure that out? I would be extremely grateful for some guidance. This way, I could keep my bioconductor packages up to date. If you think you can help then please see below, for the code and session Info. Thank you for your time. sincerely, Lee Brooks ---- HERE IS THE CODE PART Here is the procedure that I have been using, where x is the data frame object containing my data: y <- DGEList(counts=x,group=group) y <- calcNormFactors(y) y <- estimateCommonDisp(y) y <- estimateTagwiseDisp(y) et <- exactTest(y) topTags(et) de <- decideTestsDGE(et, p=0.05, adjust="BH") detags <- et[as.logical(de)] detags <- topTags(detags, n = length(detags)) ---- HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: Tables Tables logFC logCPM PValue FDR fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 8.9E-26 fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 ----- HERE IS THE sessionInfo() PART... THE CONFIGURATION THAT WORKED FOR ME: R version 2.14.1 (2011-12-22) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.4.6 limma_3.10.3 THE CONFIGURATION THAT DOES NOT WORK FOR ME: R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.2 limma_3.14.1 [[alternative HTML version deleted]]

edgeR edgeR • 1.5k views

ADD COMMENT • link updated 11.9 years ago by Gordon Smyth 51k • written 11.9 years ago by Lionel Lee Brooks 3rd ▴ 10

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 4.2 years ago

Finland

Dear Lee, Hopefully this archive thread is useful to your case: Differences in results between two different versions of edgeR https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html Best regards, KJ Lim On 18 December 2012 20:35, Lionel (Lee) Brooks 3rd < Lionel.Brooks.GR@dartmouth.edu> wrote: > Hello Gentlepeople, > > I have been using the edgeR package to identify differentially abundant > tags in an RNA-seq experiment. > > I was happy when the edgeR package version 2.4.6 called 41 differentially > abundant tags. > > However, now I am unhappy because the same analysis with edgeR package > version 3.0.2 does not call any differentially abundant tags. > > I've compared the output of the two versions and I can see that both > versions > calculate the same fold changes - however - the p-value calculation is > different. > > I feel that the optimal solution for me is to determine the command > arguments > that I can use to emulate the previous default behavior. > > Can anyone help me figure that out? > I would be extremely grateful for some guidance. > This way, I could keep my bioconductor packages up to date. > > If you think you can help then please see below, for the code and > session Info. > > Thank you for your time. > > sincerely, > Lee Brooks > > ---- > HERE IS THE CODE PART > > Here is the procedure that I have been using, > where x is the data frame object containing my data: > y <- DGEList(counts=x,group=group) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > y <- estimateTagwiseDisp(y) > et <- exactTest(y) > topTags(et) > de <- decideTestsDGE(et, p=0.05, adjust="BH") > detags <- et[as.logical(de)] > detags <- topTags(detags, n = length(detags)) > > ---- > HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: > > Tables Tables > > logFC logCPM PValue FDR > fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 > 8.9E-26 > fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 > > > ----- > HERE IS THE sessionInfo() PART... > > THE CONFIGURATION THAT WORKED FOR ME: > > R version 2.14.1 (2011-12-22) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 > > THE CONFIGURATION THAT DOES NOT WORK FOR ME: > > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.2 limma_3.14.1 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.9 years ago KJ Lim ▴ 420

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 10 minutes ago

WEHI, Melbourne, Australia

Dear Lee Brooks, The current version will behave almost the same as the older version if you set the prior.df to the old default value: estimateTagwiseDisp(y, prior.df=20) It seems that you have some genes with large fold changes but also large variability between replicates. With a large prior.df, the dispersions for these genes are forced down towards the global trend, and this is apparently enough to allow them to be statistically significant. With a smaller prior.df, the tagwise dispersions remain closer to individual dispersion values, so the genes are penalized more for their large variability, and they become non-significant. Best wishes Gordon > Date: Tue, 18 Dec 2012 13:35:11 -0500 > From: "Lionel (Lee) Brooks 3rd" <lionel.brooks.gr at="" dartmouth.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] Drastic change in edgeR p-value calculation? > > Hello Gentlepeople, > > I have been using the edgeR package to identify differentially abundant > tags in an RNA-seq experiment. > > I was happy when the edgeR package version 2.4.6 called 41 > differentially abundant tags. > > However, now I am unhappy because the same analysis with edgeR package > version 3.0.2 does not call any differentially abundant tags. > > I've compared the output of the two versions and I can see that both > versions calculate the same fold changes - however - the p-value > calculation is different. > > I feel that the optimal solution for me is to determine the command > arguments that I can use to emulate the previous default behavior. > > Can anyone help me figure that out? > I would be extremely grateful for some guidance. > This way, I could keep my bioconductor packages up to date. > > If you think you can help then please see below, for the code and > session Info. > > Thank you for your time. > > sincerely, > Lee Brooks > > ---- > HERE IS THE CODE PART > > Here is the procedure that I have been using, > where x is the data frame object containing my data: > y <- DGEList(counts=x,group=group) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > y <- estimateTagwiseDisp(y) > et <- exactTest(y) > topTags(et) > de <- decideTestsDGE(et, p=0.05, adjust="BH") > detags <- et[as.logical(de)] > detags <- topTags(detags, n = length(detags)) > > ---- > HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: > > Tables Tables > > logFC logCPM PValue FDR > fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 8.9E-26 > fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 > > > ----- > HERE IS THE sessionInfo() PART... > > THE CONFIGURATION THAT WORKED FOR ME: > > R version 2.14.1 (2011-12-22) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 > > THE CONFIGURATION THAT DOES NOT WORK FOR ME: > > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.2 limma_3.14.1 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.9 years ago Gordon Smyth 51k

Login before adding your answer.