Question

GOstats hyperGTest question

0

Entering edit mode

ivan.borozan@utoronto.ca ▴ 80

@ivanborozanutorontoca-704

Last seen 10.6 years ago

Hi all, I got following results using hyperGTest(params) with a given list of genes > summary(hgOver) GOBPID Pvalue OddsRatio ExpCount Count Size 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2 ect ... If for example I look at genes that are associated with the first GO term (i.e GO:0030185) I get: > probeSetSummary(hgOver)[[1]] EntrezID ProbeSetID selected 1 3043 144221 0 2 3043 148425 0 3 3043 3108408 0 4 3043 5708746 0 My question is how are Counts (in this case Count = 2) in the above summary(hgOver) table obtained ? Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID (EntrezID = 3043) and 4 ProbeSetID associated with this particular node (i.e GO:0030185). R version 2.4.0 (2006-10-03) i686-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" [7] "utils" "datasets" "base" other attached packages: xtable Hu19K8Build17102006 GOstats Category "1.4-3" "1.1.0" "2.0.4" "2.0.3" genefilter KEGG RBGL GO "1.12.0" "1.14.1" "1.10.0" "1.14.0" graph multtest survival sma "1.12.0" "1.12.0" "2.29" "0.5.15" annotate Biobase human19K601042005 "1.12.0" "1.12.1" "1.1.0" all the best, Ivan

Biobase annotate genefilter multtest graph Biobase annotate genefilter multtest graph • 1.5k views

ADD COMMENT • link updated 18.2 years ago by Seth Falcon ★ 7.4k • written 18.2 years ago by ivan.borozan@utoronto.ca ▴ 80

score 0 · Answer 1 · 2007-01-25

Hi Ivan, ivan.borozan at utoronto.ca writes: > I got following results using hyperGTest(params) with a given list of genes > >> summary(hgOver) > GOBPID Pvalue OddsRatio ExpCount Count Size > 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1 > 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2 > 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2 Hmm, that is a suspect result. One would expect Size >= Count. In the current devel version of Category and GOstats, I have added code to verify that the selected gene list (geneIds) and the gene universe do not contain any duplicates. Could you verify that your input does not contain duplicate IDs either in the selected list or the universe? > If for example I look at genes that are associated with the first GO > term (i.e GO:0030185) I get: > > >> probeSetSummary(hgOver)[[1]] > EntrezID ProbeSetID selected > 1 3043 144221 0 > 2 3043 148425 0 > 3 3043 3108408 0 > 4 3043 5708746 0 This is, of course, also surprising, but it is difficult to assess what is going on without knowing more details of what data you used as input. Are you sure that all Entrez IDs in geneIds(params) are represented by at least one probe set on the chip? > My question is how are Counts (in this case Count = 2) in the above > summary(hgOver) table obtained ? The details are in the code, but the intention is that Count is the intersection of the selected gene list with the Entrez IDs annotated at the given GO term. > Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID > (EntrezID = 3043) and 4 ProbeSetID associated with this particular > node (i.e GO:0030185). That just tells you that there are 4 probesets that interrogate Entrez ID 3043. The count in the hyperGTest result tells you that 2 Entrez IDs from the selected gene list are in the list of genes annotated at GO:0030185. I have added a considerable amount of detail to the GOstats vignette in the current devel repository and I would suggest reading over it: http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html + seth

score 0 · Answer 2 · 2007-01-26

ivan.borozan at utoronto.ca writes: > Hi Seth, > Thanks for your replay I actually had duplicates in my gene universe. > Running hyperGTest now (without duplicates) gives meaningful > results. Glad to hear it. As I mentioned, the next release will include features to bring such duplicate IDs to your attention sooner rather than later :-) + seth