GOstats hyperGTest question
2
0
Entering edit mode
@ivanborozanutorontoca-704
Last seen 10.2 years ago
Hi all, I got following results using hyperGTest(params) with a given list of genes > summary(hgOver) GOBPID Pvalue OddsRatio ExpCount Count Size 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2 ect ... If for example I look at genes that are associated with the first GO term (i.e GO:0030185) I get: > probeSetSummary(hgOver)[[1]] EntrezID ProbeSetID selected 1 3043 144221 0 2 3043 148425 0 3 3043 3108408 0 4 3043 5708746 0 My question is how are Counts (in this case Count = 2) in the above summary(hgOver) table obtained ? Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID (EntrezID = 3043) and 4 ProbeSetID associated with this particular node (i.e GO:0030185). R version 2.4.0 (2006-10-03) i686-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" [7] "utils" "datasets" "base" other attached packages: xtable Hu19K8Build17102006 GOstats Category "1.4-3" "1.1.0" "2.0.4" "2.0.3" genefilter KEGG RBGL GO "1.12.0" "1.14.1" "1.10.0" "1.14.0" graph multtest survival sma "1.12.0" "1.12.0" "2.29" "0.5.15" annotate Biobase human19K601042005 "1.12.0" "1.12.1" "1.1.0" all the best, Ivan
Biobase annotate genefilter multtest graph Biobase annotate genefilter multtest graph • 1.4k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
Hi Ivan, ivan.borozan at utoronto.ca writes: > I got following results using hyperGTest(params) with a given list of genes > >> summary(hgOver) > GOBPID Pvalue OddsRatio ExpCount Count Size > 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1 > 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2 > 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2 Hmm, that is a suspect result. One would expect Size >= Count. In the current devel version of Category and GOstats, I have added code to verify that the selected gene list (geneIds) and the gene universe do not contain any duplicates. Could you verify that your input does not contain duplicate IDs either in the selected list or the universe? > If for example I look at genes that are associated with the first GO > term (i.e GO:0030185) I get: > > >> probeSetSummary(hgOver)[[1]] > EntrezID ProbeSetID selected > 1 3043 144221 0 > 2 3043 148425 0 > 3 3043 3108408 0 > 4 3043 5708746 0 This is, of course, also surprising, but it is difficult to assess what is going on without knowing more details of what data you used as input. Are you sure that all Entrez IDs in geneIds(params) are represented by at least one probe set on the chip? > My question is how are Counts (in this case Count = 2) in the above > summary(hgOver) table obtained ? The details are in the code, but the intention is that Count is the intersection of the selected gene list with the Entrez IDs annotated at the given GO term. > Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID > (EntrezID = 3043) and 4 ProbeSetID associated with this particular > node (i.e GO:0030185). That just tells you that there are 4 probesets that interrogate Entrez ID 3043. The count in the hyperGTest result tells you that 2 Entrez IDs from the selected gene list are in the list of genes annotated at GO:0030185. I have added a considerable amount of detail to the GOstats vignette in the current devel repository and I would suggest reading over it: http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html + seth
ADD COMMENT
0
Entering edit mode
Hi Seth, Thanks for your replay I actually had duplicates in my gene universe. Running hyperGTest now (without duplicates) gives meaningful results. all the best, Ivan Quoting Seth Falcon <sfalcon at="" fhcrc.org="">: > Hi Ivan, > > ivan.borozan at utoronto.ca writes: >> I got following results using hyperGTest(params) with a given list of genes >> >>> summary(hgOver) >> GOBPID Pvalue OddsRatio ExpCount Count Size >> 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1 >> 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2 >> 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2 > > Hmm, that is a suspect result. One would expect Size >= Count. In > the current devel version of Category and GOstats, I have added code > to verify that the selected gene list (geneIds) and the gene universe > do not contain any duplicates. Could you verify that your input does > not contain duplicate IDs either in the selected list or the universe? > >> If for example I look at genes that are associated with the first GO >> term (i.e GO:0030185) I get: >> >> >>> probeSetSummary(hgOver)[[1]] >> EntrezID ProbeSetID selected >> 1 3043 144221 0 >> 2 3043 148425 0 >> 3 3043 3108408 0 >> 4 3043 5708746 0 > > This is, of course, also surprising, but it is difficult to assess > what is going on without knowing more details of what data you used as > input. Are you sure that all Entrez IDs in geneIds(params) are > represented by at least one probe set on the chip? > >> My question is how are Counts (in this case Count = 2) in the above >> summary(hgOver) table obtained ? > > The details are in the code, but the intention is that Count is the > intersection of the selected gene list with the Entrez IDs annotated > at the given GO term. > >> Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID >> (EntrezID = 3043) and 4 ProbeSetID associated with this particular >> node (i.e GO:0030185). > > That just tells you that there are 4 probesets that interrogate Entrez > ID 3043. The count in the hyperGTest result tells you that 2 Entrez > IDs from the selected gene list are in the list of genes annotated at > GO:0030185. > > I have added a considerable amount of detail to the GOstats vignette > in the current devel repository and I would suggest reading over it: > > http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html > > + seth >
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
ivan.borozan at utoronto.ca writes: > Hi Seth, > Thanks for your replay I actually had duplicates in my gene universe. > Running hyperGTest now (without duplicates) gives meaningful > results. Glad to hear it. As I mentioned, the next release will include features to bring such duplicate IDs to your attention sooner rather than later :-) + seth
ADD COMMENT

Login before adding your answer.

Traffic: 939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6