ANOVA p value calculation
1
0
Entering edit mode
@john-antonydas-gaspar-3144
Last seen 10.2 years ago
Dear Sir/Madam, I wish to do ANOVA p value computation with Bioconductor. I have RMA normalized dataset in tab delimited text. Data<-read.table("RMA- normalized.txt",row.names=1,sep="\t",header=TRUE,dec = ".",as.is =TRUE,na.strings = "NA", colClasses = NA,check.names = FALSE,strip.white = FALSE, blank.lines.skip = TRUE, allowEscapes = FALSE, flush = FALSE,encoding = "unknown") dim(Data) [1] 31528 51 17 * 3 = 51 There are 17 different cell line type samples with 3 biological replicates. I wish to do ANOVA - p value calculation for finding out the differentially expressed genes. Kindly help me out to use the right Bioconductor package. Using multtest package I tried to do F stat but could not. When I try the R console window goes off. Could be due to many number of couloumns?. Please help me out to find a solution. Thanking you in advance, With kind regards, antony
• 2.1k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Mar 11, 2009 at 1:19 PM, Antony <gasparj@uni-koeln.de> wrote: > Dear Sir/Madam, > > I wish to do ANOVA p value computation with Bioconductor. > You might want to look at the limma package, though there are a number of others. > > I have RMA normalized dataset in tab delimited text. > > Data<-read.table("RMA- normalized.txt",row.names=1,sep="\t",header=TRUE,dec > = ".",as.is =TRUE,na.strings = "NA", > colClasses = NA,check.names = FALSE,strip.white = FALSE, > blank.lines.skip = TRUE, > allowEscapes = FALSE, flush = FALSE,encoding = "unknown") > > dim(Data) > [1] 31528 51 > > 17 * 3 = 51 There are 17 different cell line type samples with 3 biological > replicates. > > I wish to do ANOVA - p value calculation for finding out the differentially > expressed genes. > > Kindly help me out to use the right Bioconductor package. Using multtest > package I tried to do F stat but could not. When I try > the R console window goes off. Could be due to many number of couloumns?. > You'll probably need to read the posting guide to get a bit better help. In particular, you would need to provide at least your code and the output, error messages, and sessionInfo(). > > Please help me out to find a solution. > > Thanking you in advance, > > > With kind regards, > antony > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Assumin that columns 1-3 represents the first cell line, columns 4-6 represents second cell line etc, then you can try something like cellline <- factor( paste("type_", rep(1:17, each=3), sep="") ) apply( Data, 2, function(gene) anova(lm(gene ~ cellline))$"Pr(>F)"[1] ) However, in the context of microarray experiements, it is far better to use the functions in limma package. Regards, Adai Sean Davis wrote: > On Wed, Mar 11, 2009 at 1:19 PM, Antony <gasparj at="" uni-koeln.de=""> wrote: > >> Dear Sir/Madam, >> >> I wish to do ANOVA p value computation with Bioconductor. >> > > You might want to look at the limma package, though there are a number of > others. > > >> I have RMA normalized dataset in tab delimited text. >> >> Data<-read.table("RMA- normalized.txt",row.names=1,sep="\t",header=TRUE,dec >> = ".",as.is =TRUE,na.strings = "NA", >> colClasses = NA,check.names = FALSE,strip.white = FALSE, >> blank.lines.skip = TRUE, >> allowEscapes = FALSE, flush = FALSE,encoding = "unknown") >> >> dim(Data) >> [1] 31528 51 >> >> 17 * 3 = 51 There are 17 different cell line type samples with 3 biological >> replicates. >> >> I wish to do ANOVA - p value calculation for finding out the differentially >> expressed genes. >> >> Kindly help me out to use the right Bioconductor package. Using multtest >> package I tried to do F stat but could not. When I try >> the R console window goes off. Could be due to many number of couloumns?. >> > > You'll probably need to read the posting guide to get a bit better help. In > particular, you would need to provide at least your code and the output, > error messages, and sessionInfo(). > > >> Please help me out to find a solution. >> >> Thanking you in advance, >> >> >> With kind regards, >> antony >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi, I would like to get a better understanding of exactly how the calculations in GOstats are performed. Does the package use a standard hypergeometric test? If I have the following values: geneIds = 266 universeGeneIds = 1803 and a particular GO term has: size = 124 count = 55 then the associated p-value, odds ratio and expected count are 1.519928e-16, 5.660429, and 18.81197 respectively. However, I would have thought the expected count would be 266 * 124 / 1803 = 18.29. In addition, the p-value obtained from the hypergeometric test using the following website http://keisan.casio.com/has10/SpecExec.cgi is different. Are there any steps performed by the GOstats package that make it different from a standard hypergeo test? What is the reason for these differences? thanks in advance, Sebastien
ADD REPLY
0
Entering edit mode
Hi Sebastien, It is expected that you do a little bit of the homework before posting. Some things to try: 1) please read the posting guide, you need to give us information on your particular version of R and Bioconductor, so that we can attempt to reproduce the results you have. 2) you need to give a reproducible example, so we can test and make sure we are getting the same answer as you do. In this case you did not show us even the call you used, so we have no way of knowing what was computed, and hence cannot do more than speculate - which is a waste of everyone's time. 3) you need to read the documentation, there are manual pages and a vignette. These are sometimes unclear, and/or incomplete, and letting us know what is not clear helps us to improve them. 4) you should check the mailing list archive to see if the topic has already been discussed (this one has come up very many times), and then you can readily obtain your answer. So, some speculation, I suspect that your test is conditional. And if I understand your example, this is easily checked by a direct call to fisher.test, which is different from the parts that you did report, again suggesting that your testing is conditional. > xm [,1] [,2] [1,] 55 211 [2,] 69 1468 > fisher.test(xm) Fisher's Exact Test for Count Data data: xm p-value < 2.2e-16 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 3.701153 8.260393 sample estimates: odds ratio 5.537531 best wishes Robert Sebastien Gerega wrote: > Hi, > I would like to get a better understanding of exactly how the > calculations in GOstats are performed. > Does the package use a standard hypergeometric test? > If I have the following values: > geneIds = 266 > universeGeneIds = 1803 > > and a particular GO term has: > size = 124 > count = 55 > > then the associated p-value, odds ratio and expected count are > 1.519928e-16, 5.660429, and 18.81197 respectively. > > However, I would have thought the expected count would be 266 * 124 / > 1803 = 18.29. In addition, the p-value obtained from the hypergeometric > test using the following website http://keisan.casio.com/has10/SpecExec.cgi > is different. > > Are there any steps performed by the GOstats package that make it > different from a standard hypergeo test? What is the reason for these > differences? > thanks in advance, > Sebastien > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Hi Robert, sorry about not having included the sessionInfo(). I have included all the required information this time - my sessionInfo() and links to the input data can be found at the end of the email. Here is the code that I am using: entrezUni = read.table("entrezUni.tsv", sep="\t")[,1] sigEntrez = read.table("sigEntrez.tsv", sep="\t")[,1] params = new("GOHyperGParams", geneIds=as.integer(sigEntrez), universeGeneIds = as.integer(entrezUni), annotation="mouse4302.db", pvalueCutoff=0.05, testDirection="over", ontology = "BP", conditional=FALSE) hgOver = hyperGTest(params) report = summary(hgOver) length(entrezUni) length(sigEntrez) report[1:3,] and the output: > length(entrezUni) [1] 1803 > length(sigEntrez) [1] 266 > report[1:3,] GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0006952 1.519928e-16 5.660429 18.81197 55 124 defense response 2 GO:0050896 9.662799e-16 3.570590 50.36752 99 332 response to stimulus 3 GO:0002376 9.810622e-16 4.164349 31.70726 74 209 immune system process So I am not using a conditional test and I have searched through the mailing list and documentation for information about how ExpCount is calculated but have not found an answer. I would have expected it to be a simple case of Size/universeGeneIds*geneIds However, this is not the case: > man = report[1:3,]$Size/length(entrezUni)*length(sigEntrez) > report[1:3,]$ExpCount - man [1] 0.5180113 1.3869335 0.8730997 I understand that I may be omitting some details or steps in the process but I have searched and been unable to find any information regarding this. Your help is very much appreciated. regards, Sebastien http://sydneybioinformatics.org/download/sigEntrez.tsv http://sydneybioinformatics.org/download/entrezUni.tsv > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M ONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia. 1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] GOstats_2.8.0 Category_2.8.2 genefilter_1.22.0 survival_2.34-1 RBGL_1.18.0 annotate_1.20.1 xtable_1.5-4 [8] GO.db_2.2.5 graph_1.20.0 mouse4302.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 loaded via a namespace (and not attached): [1] cluster_1.11.12 GSEABase_1.4.0 XML_1.99-0 Robert Gentleman wrote: > Hi Sebastien, > It is expected that you do a little bit of the homework before posting. > Some things to try: > > 1) please read the posting guide, you need to give us information on your > particular version of R and Bioconductor, so that we can attempt to reproduce > the results you have. > 2) you need to give a reproducible example, so we can test and make sure we are > getting the same answer as you do. In this case you did not show us even the > call you used, so we have no way of knowing what was computed, and hence cannot > do more than speculate - which is a waste of everyone's time. > 3) you need to read the documentation, there are manual pages and a vignette. > These are sometimes unclear, and/or incomplete, and letting us know what is not > clear helps us to improve them. > 4) you should check the mailing list archive to see if the topic has already > been discussed (this one has come up very many times), and then you can readily > obtain your answer. > > So, some speculation, I suspect that your test is conditional. > And if I understand your example, this is easily checked by a direct call to > fisher.test, which is different from the parts that you did report, again > suggesting that your testing is conditional. > > >> xm >> > [,1] [,2] > [1,] 55 211 > [2,] 69 1468 > >> fisher.test(xm) >> > > Fisher's Exact Test for Count Data > > data: xm > p-value < 2.2e-16 > alternative hypothesis: true odds ratio is not equal to 1 > 95 percent confidence interval: > 3.701153 8.260393 > sample estimates: > odds ratio > 5.537531 > > best wishes > Robert > > Sebastien Gerega wrote: > >> Hi, >> I would like to get a better understanding of exactly how the >> calculations in GOstats are performed. >> Does the package use a standard hypergeometric test? >> If I have the following values: >> geneIds = 266 >> universeGeneIds = 1803 >> >> and a particular GO term has: >> size = 124 >> count = 55 >> >> then the associated p-value, odds ratio and expected count are >> 1.519928e-16, 5.660429, and 18.81197 respectively. >> >> However, I would have thought the expected count would be 266 * 124 / >> 1803 = 18.29. In addition, the p-value obtained from the hypergeometric >> test using the following website http://keisan.casio.com/has10/SpecExec.cgi >> is different. >> >> Are there any steps performed by the GOstats package that make it >> different from a standard hypergeo test? What is the reason for these >> differences? >> thanks in advance, >> Sebastien >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >
ADD REPLY
0
Entering edit mode
Hi Sebastien, The source of your confusion on this is the fact that not every gene is tagged with a GO term. Thus, the size of both your gene universe and your gene samples are not what you think they are. They are actually both smaller because only those genes which have GO annotation in some form can reasonably be included. After GOstats gets finished paring down the gene universe it becomes 1404 long instead of 1803, and then it has to only include things in your sample that are actually contained within that "pared down" universe which results in your sample being only 213 long instead of 266. That is why the expected count is actually 213*124/1404 = 18.81197 instead of 18.29. Actually, you could have even seen that this paring down had occurred by looking at the hgOver object that your code produces: > hgOver Gene to GO BP test for over-representation 1244 GO BP ids tested (131 have p < 0.05) Selected gene set size: 213 Gene universe size: 1404 Annotation package: mouse4302 Hope this clarifies things! Marc Sebastien Gerega wrote: > Hi Robert, > sorry about not having included the sessionInfo(). I have included all > the required information this time - my sessionInfo() and links to the > input data can be found at the end of the email. > > > Here is the code that I am using: > entrezUni = read.table("entrezUni.tsv", sep="\t")[,1] > sigEntrez = read.table("sigEntrez.tsv", sep="\t")[,1] > params = new("GOHyperGParams", geneIds=as.integer(sigEntrez), > universeGeneIds = as.integer(entrezUni), > annotation="mouse4302.db", pvalueCutoff=0.05, testDirection="over", > ontology = "BP", conditional=FALSE) > hgOver = hyperGTest(params) > report = summary(hgOver) > length(entrezUni) > length(sigEntrez) > report[1:3,] > > > and the output: > > length(entrezUni) > [1] 1803 > > length(sigEntrez) > [1] 266 > > report[1:3,] > GOBPID Pvalue OddsRatio ExpCount Count > Size Term > 1 GO:0006952 1.519928e-16 5.660429 18.81197 55 124 defense > response > 2 GO:0050896 9.662799e-16 3.570590 50.36752 99 332 response to > stimulus > 3 GO:0002376 9.810622e-16 4.164349 31.70726 74 209 immune system > process > > > So I am not using a conditional test and I have searched through the > mailing list and documentation for information about how ExpCount is > calculated but have not found an answer. > I would have expected it to be a simple case of > > Size/universeGeneIds*geneIds > > However, this is not the case: > > man = report[1:3,]$Size/length(entrezUni)*length(sigEntrez) > > report[1:3,]$ExpCount - man > [1] 0.5180113 1.3869335 0.8730997 > > > I understand that I may be omitting some details or steps in the > process but I have searched and been unable to find any information > regarding this. > Your help is very much appreciated. > regards, > Sebastien > > http://sydneybioinformatics.org/download/sigEntrez.tsv > http://sydneybioinformatics.org/download/entrezUni.tsv > > > sessionInfo() > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC _MONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australi a.1252 > > > attached base packages: > [1] splines tools stats graphics grDevices utils > datasets methods base > other attached packages: > [1] GOstats_2.8.0 Category_2.8.2 genefilter_1.22.0 > survival_2.34-1 RBGL_1.18.0 annotate_1.20.1 > xtable_1.5-4 [8] GO.db_2.2.5 graph_1.20.0 > mouse4302.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 > AnnotationDbi_1.4.2 Biobase_2.2.1 > loaded via a namespace (and not attached): > [1] cluster_1.11.12 GSEABase_1.4.0 XML_1.99-0 > > > Robert Gentleman wrote: >> Hi Sebastien, >> It is expected that you do a little bit of the homework before >> posting. >> Some things to try: >> >> 1) please read the posting guide, you need to give us information on >> your >> particular version of R and Bioconductor, so that we can attempt to >> reproduce >> the results you have. >> 2) you need to give a reproducible example, so we can test and make >> sure we are >> getting the same answer as you do. In this case you did not show us >> even the >> call you used, so we have no way of knowing what was computed, and >> hence cannot >> do more than speculate - which is a waste of everyone's time. >> 3) you need to read the documentation, there are manual pages and a >> vignette. >> These are sometimes unclear, and/or incomplete, and letting us know >> what is not >> clear helps us to improve them. >> 4) you should check the mailing list archive to see if the topic has >> already >> been discussed (this one has come up very many times), and then you >> can readily >> obtain your answer. >> >> So, some speculation, I suspect that your test is conditional. >> And if I understand your example, this is easily checked by a direct >> call to >> fisher.test, which is different from the parts that you did report, >> again >> suggesting that your testing is conditional. >> >> >>> xm >>> >> [,1] [,2] >> [1,] 55 211 >> [2,] 69 1468 >> >>> fisher.test(xm) >>> >> >> Fisher's Exact Test for Count Data >> >> data: xm >> p-value < 2.2e-16 >> alternative hypothesis: true odds ratio is not equal to 1 >> 95 percent confidence interval: >> 3.701153 8.260393 >> sample estimates: >> odds ratio >> 5.537531 >> >> best wishes >> Robert >> >> Sebastien Gerega wrote: >> >>> Hi, >>> I would like to get a better understanding of exactly how the >>> calculations in GOstats are performed. >>> Does the package use a standard hypergeometric test? >>> If I have the following values: >>> geneIds = 266 >>> universeGeneIds = 1803 >>> >>> and a particular GO term has: >>> size = 124 >>> count = 55 >>> >>> then the associated p-value, odds ratio and expected count are >>> 1.519928e-16, 5.660429, and 18.81197 respectively. >>> >>> However, I would have thought the expected count would be 266 * 124 / >>> 1803 = 18.29. In addition, the p-value obtained from the hypergeometric >>> test using the following website >>> http://keisan.casio.com/has10/SpecExec.cgi >>> is different. >>> >>> Are there any steps performed by the GOstats package that make it >>> different from a standard hypergeo test? What is the reason for these >>> differences? >>> thanks in advance, >>> Sebastien >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Marc, thanks for your help. I think my confusion came from the fact that I had already performed non-specific filtering to only include genes with GO annotations. So I assumed that all the genes were being included in the test but I now realise the non-specific filtering only ensures that the genes have annotation of at least 1 of the GO trees.... thanks again, Sebastien Marc Carlson wrote: > Hi Sebastien, > > The source of your confusion on this is the fact that not every gene is > tagged with a GO term. Thus, the size of both your gene universe and > your gene samples are not what you think they are. They are actually > both smaller because only those genes which have GO annotation in some > form can reasonably be included. After GOstats gets finished paring > down the gene universe it becomes 1404 long instead of 1803, and then it > has to only include things in your sample that are actually contained > within that "pared down" universe which results in your sample being > only 213 long instead of 266. That is why the expected count is actually > > 213*124/1404 = 18.81197 instead of 18.29. > > Actually, you could have even seen that this paring down had occurred by > looking at the hgOver object that your code produces: > > >> hgOver >> > Gene to GO BP test for over-representation > 1244 GO BP ids tested (131 have p < 0.05) > Selected gene set size: 213 > Gene universe size: 1404 > Annotation package: mouse4302 > > Hope this clarifies things! > > > Marc > > > > > Sebastien Gerega wrote: > >> Hi Robert, >> sorry about not having included the sessionInfo(). I have included all >> the required information this time - my sessionInfo() and links to the >> input data can be found at the end of the email. >> >> >> Here is the code that I am using: >> entrezUni = read.table("entrezUni.tsv", sep="\t")[,1] >> sigEntrez = read.table("sigEntrez.tsv", sep="\t")[,1] >> params = new("GOHyperGParams", geneIds=as.integer(sigEntrez), >> universeGeneIds = as.integer(entrezUni), >> annotation="mouse4302.db", pvalueCutoff=0.05, testDirection="over", >> ontology = "BP", conditional=FALSE) >> hgOver = hyperGTest(params) >> report = summary(hgOver) >> length(entrezUni) >> length(sigEntrez) >> report[1:3,] >> >> >> and the output: >> >>> length(entrezUni) >>> >> [1] 1803 >> >>> length(sigEntrez) >>> >> [1] 266 >> >>> report[1:3,] >>> >> GOBPID Pvalue OddsRatio ExpCount Count >> Size Term >> 1 GO:0006952 1.519928e-16 5.660429 18.81197 55 124 defense >> response >> 2 GO:0050896 9.662799e-16 3.570590 50.36752 99 332 response to >> stimulus >> 3 GO:0002376 9.810622e-16 4.164349 31.70726 74 209 immune system >> process >> >> >> So I am not using a conditional test and I have searched through the >> mailing list and documentation for information about how ExpCount is >> calculated but have not found an answer. >> I would have expected it to be a simple case of >> >> Size/universeGeneIds*geneIds >> >> However, this is not the case: >> >>> man = report[1:3,]$Size/length(entrezUni)*length(sigEntrez) >>> report[1:3,]$ExpCount - man >>> >> [1] 0.5180113 1.3869335 0.8730997 >> >> >> I understand that I may be omitting some details or steps in the >> process but I have searched and been unable to find any information >> regarding this. >> Your help is very much appreciated. >> regards, >> Sebastien >> >> http://sydneybioinformatics.org/download/sigEntrez.tsv >> http://sydneybioinformatics.org/download/entrezUni.tsv >> >> >>> sessionInfo() >>> >> R version 2.8.1 (2008-12-22) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;L C_MONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Austral ia.1252 >> >> >> attached base packages: >> [1] splines tools stats graphics grDevices utils >> datasets methods base >> other attached packages: >> [1] GOstats_2.8.0 Category_2.8.2 genefilter_1.22.0 >> survival_2.34-1 RBGL_1.18.0 annotate_1.20.1 >> xtable_1.5-4 [8] GO.db_2.2.5 graph_1.20.0 >> mouse4302.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 >> AnnotationDbi_1.4.2 Biobase_2.2.1 >> loaded via a namespace (and not attached): >> [1] cluster_1.11.12 GSEABase_1.4.0 XML_1.99-0 >> >> >> Robert Gentleman wrote: >> >>> Hi Sebastien, >>> It is expected that you do a little bit of the homework before >>> posting. >>> Some things to try: >>> >>> 1) please read the posting guide, you need to give us information on >>> your >>> particular version of R and Bioconductor, so that we can attempt to >>> reproduce >>> the results you have. >>> 2) you need to give a reproducible example, so we can test and make >>> sure we are >>> getting the same answer as you do. In this case you did not show us >>> even the >>> call you used, so we have no way of knowing what was computed, and >>> hence cannot >>> do more than speculate - which is a waste of everyone's time. >>> 3) you need to read the documentation, there are manual pages and a >>> vignette. >>> These are sometimes unclear, and/or incomplete, and letting us know >>> what is not >>> clear helps us to improve them. >>> 4) you should check the mailing list archive to see if the topic has >>> already >>> been discussed (this one has come up very many times), and then you >>> can readily >>> obtain your answer. >>> >>> So, some speculation, I suspect that your test is conditional. >>> And if I understand your example, this is easily checked by a direct >>> call to >>> fisher.test, which is different from the parts that you did report, >>> again >>> suggesting that your testing is conditional. >>> >>> >>> >>>> xm >>>> >>>> >>> [,1] [,2] >>> [1,] 55 211 >>> [2,] 69 1468 >>> >>> >>>> fisher.test(xm) >>>> >>>> >>> Fisher's Exact Test for Count Data >>> >>> data: xm >>> p-value < 2.2e-16 >>> alternative hypothesis: true odds ratio is not equal to 1 >>> 95 percent confidence interval: >>> 3.701153 8.260393 >>> sample estimates: >>> odds ratio >>> 5.537531 >>> >>> best wishes >>> Robert >>> >>> Sebastien Gerega wrote: >>> >>> >>>> Hi, >>>> I would like to get a better understanding of exactly how the >>>> calculations in GOstats are performed. >>>> Does the package use a standard hypergeometric test? >>>> If I have the following values: >>>> geneIds = 266 >>>> universeGeneIds = 1803 >>>> >>>> and a particular GO term has: >>>> size = 124 >>>> count = 55 >>>> >>>> then the associated p-value, odds ratio and expected count are >>>> 1.519928e-16, 5.660429, and 18.81197 respectively. >>>> >>>> However, I would have thought the expected count would be 266 * 124 / >>>> 1803 = 18.29. In addition, the p-value obtained from the hypergeometric >>>> test using the following website >>>> http://keisan.casio.com/has10/SpecExec.cgi >>>> is different. >>>> >>>> Are there any steps performed by the GOstats package that make it >>>> different from a standard hypergeo test? What is the reason for these >>>> differences? >>>> thanks in advance, >>>> Sebastien >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > >
ADD REPLY

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6