Hi,
I have a question about the GOstat package.
I'm working with a list of probesets, and i'm interested by testing
the
over representation of the GO terms in my list.
So, I changed the probesets IDs (of my lists and my Universe) into
ENTREZ IDs and the hyperGTest performed well.
For instance i have some result as :
GO:0005635
Pvalue = 0.04
OddsRatio = 0.04
ExpCount = 11
Count = 17
Size = 45
But, when i did the opposite :
test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
entrez <-
unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
length(entrez)
[1] 126
I don't understand why I find 126 entrez IDs in the Universe, and no
45
as expected (SIZE) ...
mylistentrez<-unique(intersect(entrez,mylist))
length(mylistentrez)
[1] 50
In the same way, I find 50 entrez IDs in my list, and no 17 as
expected
(COUNT)
Thanks for your explanations,
Bests,
Nicolas
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: Nicolas.Servant at curie.fr
Tel: 01 53 10 70 55
http://bioinfo.curie.fr/
Hi Nicolas,
Perhaps you could peruse the posting guide and provide the
information it asks you to. Only from that could one hope to give you
a
reasonable answer (ie, what commands did you run, what what the
output,
and your sessionInfo, at a bare minimum). And the Bioconductor
package
is GOstats, not GOstat - that is something else (not Bioconductor) and
if you want help with it you should ask those folks directly.
Robert
nicolas servant wrote:
> Hi,
>
> I have a question about the GOstat package.
> I'm working with a list of probesets, and i'm interested by testing
the
> over representation of the GO terms in my list.
> So, I changed the probesets IDs (of my lists and my Universe) into
> ENTREZ IDs and the hyperGTest performed well.
> For instance i have some result as :
>
> GO:0005635
> Pvalue = 0.04
> OddsRatio = 0.04
> ExpCount = 11
> Count = 17
> Size = 45
>
> But, when i did the opposite :
>
> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
> entrez <-
unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
> length(entrez)
> [1] 126
>
> I don't understand why I find 126 entrez IDs in the Universe, and no
45
> as expected (SIZE) ...
>
> mylistentrez<-unique(intersect(entrez,mylist))
> length(mylistentrez)
> [1] 50
>
> In the same way, I find 50 entrez IDs in my list, and no 17 as
expected
> (COUNT)
>
> Thanks for your explanations,
> Bests,
>
> Nicolas
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Sorry. I use the BioC GOStats package (2.2.6).
In connection with my last mail, I have a other question :
##universe
designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
entrezUniverse <-
na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
##myList
selectedEntrezIds <-
na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univers
eGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueC
utoff=0.05,conditional=TRUE,testDirection="over")
params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univers
eGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pvalueC
utoff=0.05,conditional=TRUE,testDirection="over")
params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univers
eGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pvalueC
utoff=0.05,conditional=TRUE,testDirection="over")
hgOver.BP<-hyperGTest(params.BP.over)
hgOver.MF<-hyperGTest(params.MF.over)
hgOver.CC<-hyperGTest(params.CC.over)
length(selectedEntrezIds)
[1] 3761
length(hgOver.BP at geneIds)
[1] 3122
I guess the hyperGTest has removed these 639 missing genes. But i
don't
understand why ?
Nicolas
Robert Gentleman a ?crit :
> Hi Nicolas,
>
> Perhaps you could peruse the posting guide and provide the
> information it asks you to. Only from that could one hope to give
you a
> reasonable answer (ie, what commands did you run, what what the
output,
> and your sessionInfo, at a bare minimum). And the Bioconductor
package
> is GOstats, not GOstat - that is something else (not Bioconductor)
and
> if you want help with it you should ask those folks directly.
>
> Robert
>
>
> nicolas servant wrote:
>
>> Hi,
>>
>> I have a question about the GOstat package.
>> I'm working with a list of probesets, and i'm interested by testing
the
>> over representation of the GO terms in my list.
>> So, I changed the probesets IDs (of my lists and my Universe) into
>> ENTREZ IDs and the hyperGTest performed well.
>> For instance i have some result as :
>>
>> GO:0005635
>> Pvalue = 0.04
>> OddsRatio = 0.04
>> ExpCount = 11
>> Count = 17
>> Size = 45
>>
>> But, when i did the opposite :
>>
>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
>> entrez <-
unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
>> length(entrez)
>> [1] 126
>>
>> I don't understand why I find 126 entrez IDs in the Universe, and
no 45
>> as expected (SIZE) ...
>>
>> mylistentrez<-unique(intersect(entrez,mylist))
>> length(mylistentrez)
>> [1] 50
>>
>> In the same way, I find 50 entrez IDs in my list, and no 17 as
expected
>> (COUNT)
>>
>> Thanks for your explanations,
>> Bests,
>>
>> Nicolas
>>
>>
>
>
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: Nicolas.Servant at curie.fr
Tel: 01 53 10 70 55
http://bioinfo.curie.fr/
Hi Nicolas,
nicolas servant wrote:
> Sorry. I use the BioC GOStats package (2.2.6).
Yes, but you *still* refuse to follow the posting guide, so we are
left
to guess at what you have done. Here's a hint; give us (always) the
results of calling sessionInfo(), and give us a reproducible example.
The example you give below is not reproducible, as you are the only
one
who has these data. If we can't do what you have done to see what
happened, how exactly are we to figure out why you get these results?
And by reproducible example I don't mean for you to give us a link to
your data. You can simply use set.seed(), randomly select some affy
IDs,
and then go on from there as if these were the IDs you got from an
analysis.
> In connection with my last mail, I have a other question :
>
> ##universe
> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
> entrezUniverse <-
> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
> ##myList
> selectedEntrezIds <-
> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,unive
rseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalu
eCutoff=0.05,conditional=TRUE,testDirection="over")
> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,unive
rseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pvalu
eCutoff=0.05,conditional=TRUE,testDirection="over")
> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,unive
rseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pvalu
eCutoff=0.05,conditional=TRUE,testDirection="over")
>
> hgOver.BP<-hyperGTest(params.BP.over)
> hgOver.MF<-hyperGTest(params.MF.over)
> hgOver.CC<-hyperGTest(params.CC.over)
>
> length(selectedEntrezIds)
> [1] 3761
> length(hgOver.BP at geneIds)
> [1] 3122
>
> I guess the hyperGTest has removed these 639 missing genes. But i
don't
> understand why ?
If you were to peruse the vignettes for the GOstats package, you will
see that there are two filtering steps that you should take before
running the analysis. The first is to subset to unique Entrez Gene IDs
as you have done. The second is to remove those Entrez Gene IDs that
have no GO annotations associated with them. My best guess is the 639
missing genes have no GO annotations. But I am guessing since I can't
know.
Best,
Jim
>
> Nicolas
>
> Robert Gentleman a ?crit :
>> Hi Nicolas,
>>
>> Perhaps you could peruse the posting guide and provide the
>> information it asks you to. Only from that could one hope to give
you a
>> reasonable answer (ie, what commands did you run, what what the
output,
>> and your sessionInfo, at a bare minimum). And the Bioconductor
package
>> is GOstats, not GOstat - that is something else (not Bioconductor)
and
>> if you want help with it you should ask those folks directly.
>>
>> Robert
>>
>>
>> nicolas servant wrote:
>>
>>> Hi,
>>>
>>> I have a question about the GOstat package.
>>> I'm working with a list of probesets, and i'm interested by
testing the
>>> over representation of the GO terms in my list.
>>> So, I changed the probesets IDs (of my lists and my Universe) into
>>> ENTREZ IDs and the hyperGTest performed well.
>>> For instance i have some result as :
>>>
>>> GO:0005635
>>> Pvalue = 0.04
>>> OddsRatio = 0.04
>>> ExpCount = 11
>>> Count = 17
>>> Size = 45
>>>
>>> But, when i did the opposite :
>>>
>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
>>> entrez <-
unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
>>> length(entrez)
>>> [1] 126
>>>
>>> I don't understand why I find 126 entrez IDs in the Universe, and
no 45
>>> as expected (SIZE) ...
>>>
>>> mylistentrez<-unique(intersect(entrez,mylist))
>>> length(mylistentrez)
>>> [1] 50
>>>
>>> In the same way, I find 50 entrez IDs in my list, and no 17 as
expected
>>> (COUNT)
>>>
>>> Thanks for your explanations,
>>> Bests,
>>>
>>> Nicolas
>>>
>>>
>>
>
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
> sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-solaris2.10
locale:
C
attached base packages:
[1] "splines" "tools" "stats" "graphics" "grDevices"
"utils"
[7] "datasets" "methods" "base"
other attached packages:
GOstats Category Matrix lattice genefilter
survival
"2.2.6" "2.2.3" "0.99875-2" "0.15-5" "1.14.1"
"2.31"
KEGG RBGL annotate Biobase GO
graph
"1.16.1" "1.12.0" "1.14.1" "1.14.0" "1.16.0"
"1.14.2"
hgu133plus2
"1.16.0"
Here, a *reproductible* example :
require(hgu133plus2)
##UNIVERSE
designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
##GO
haveGo <- sapply(mget(designALL, hgu133plus2GO),
function(x) {
if (length(x) == 1 && is.na(x)){
FALSE
}
else{
TRUE
}
})
numNoGO <- sum(!haveGo)
print(paste(numNoGO, "pbsets in Univers have no GO ids"))
designALL<-designALL[haveGo]
entrezUniverse <-
na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
if (any(duplicated(entrezUniverse))){
stop("error in gene universe: can't have duplicate Ent")
}
##ALIST
selectedEntrezIds <- entrezUniverse[1:150]
require(GOstats)
params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univers
eGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueC
utoff=0.05,conditional=TRUE,testDirection="over")
hgOver.BP<-hyperGTest(params.BP.over)
length(selectedEntrezIds)
length(hgOver.BP at geneIds)
As you can see, i performed a non-specific filtering by removing the
probesets with no GO annotations (as presented in the vignette !).
I'm not interested here by using an other filtering approach like
Student test, IQR filtering, etc ...
> length(selectedEntrezIds)
[1] 150
> length(hgOver.BP at geneIds)
[1] 124
To summarize, all the entrezID of my list have a GO annotation, but
the
hyperGTest function removes 26 additional genes.
I guess there are some explanations ?
Thanks for your help,
Nicolas
James W. MacDonald a ?crit :
> Hi Nicolas,
>
> nicolas servant wrote:
>> Sorry. I use the BioC GOStats package (2.2.6).
>
> Yes, but you *still* refuse to follow the posting guide, so we are
> left to guess at what you have done. Here's a hint; give us (always)
> the results of calling sessionInfo(), and give us a reproducible
> example. The example you give below is not reproducible, as you are
> the only one who has these data. If we can't do what you have done
to
> see what happened, how exactly are we to figure out why you get
these
> results?
>
> And by reproducible example I don't mean for you to give us a link
to
> your data. You can simply use set.seed(), randomly select some affy
> IDs, and then go on from there as if these were the IDs you got from
> an analysis.
>
>> In connection with my last mail, I have a other question :
>>
>> ##universe
>> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
>> entrezUniverse <-
>> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
>> ##myList
>> selectedEntrezIds <-
>> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
>> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univ
erseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pval
ueCutoff=0.05,conditional=TRUE,testDirection="over")
>>
>> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univ
erseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pval
ueCutoff=0.05,conditional=TRUE,testDirection="over")
>>
>> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univ
erseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pval
ueCutoff=0.05,conditional=TRUE,testDirection="over")
>>
>>
>> hgOver.BP<-hyperGTest(params.BP.over)
>> hgOver.MF<-hyperGTest(params.MF.over)
>> hgOver.CC<-hyperGTest(params.CC.over)
>>
>> length(selectedEntrezIds)
>> [1] 3761
>> length(hgOver.BP at geneIds)
>> [1] 3122
>>
>> I guess the hyperGTest has removed these 639 missing genes. But i
>> don't understand why ?
>
> If you were to peruse the vignettes for the GOstats package, you
will
> see that there are two filtering steps that you should take before
> running the analysis. The first is to subset to unique Entrez Gene
IDs
> as you have done. The second is to remove those Entrez Gene IDs that
> have no GO annotations associated with them. My best guess is the
639
> missing genes have no GO annotations. But I am guessing since I
can't
> know.
>
> Best,
>
> Jim
>
>
>>
>> Nicolas
>>
>> Robert Gentleman a ?crit :
>>> Hi Nicolas,
>>>
>>> Perhaps you could peruse the posting guide and provide the
>>> information it asks you to. Only from that could one hope to give
>>> you a reasonable answer (ie, what commands did you run, what what
>>> the output, and your sessionInfo, at a bare minimum). And the
>>> Bioconductor package is GOstats, not GOstat - that is something
else
>>> (not Bioconductor) and if you want help with it you should ask
those
>>> folks directly.
>>>
>>> Robert
>>>
>>>
>>> nicolas servant wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a question about the GOstat package.
>>>> I'm working with a list of probesets, and i'm interested by
testing
>>>> the over representation of the GO terms in my list.
>>>> So, I changed the probesets IDs (of my lists and my Universe)
into
>>>> ENTREZ IDs and the hyperGTest performed well.
>>>> For instance i have some result as :
>>>>
>>>> GO:0005635
>>>> Pvalue = 0.04
>>>> OddsRatio = 0.04
>>>> ExpCount = 11
>>>> Count = 17
>>>> Size = 45
>>>>
>>>> But, when i did the opposite :
>>>>
>>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
>>>> entrez <-
>>>> unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
>>>> length(entrez)
>>>> [1] 126
>>>>
>>>> I don't understand why I find 126 entrez IDs in the Universe, and
>>>> no 45 as expected (SIZE) ...
>>>>
>>>> mylistentrez<-unique(intersect(entrez,mylist))
>>>> length(mylistentrez)
>>>> [1] 50
>>>>
>>>> In the same way, I find 50 entrez IDs in my list, and no 17 as
>>>> expected (COUNT)
>>>>
>>>> Thanks for your explanations,
>>>> Bests,
>>>>
>>>> Nicolas
>>>>
>>>>
>>>
>>
>>
>
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: Nicolas.Servant at curie.fr
Tel: 01 53 10 70 55
http://bioinfo.curie.fr/
On Fri, Mar 21, 2008 at 9:51 AM, nicolas servant
<nicolas.servant at="" curie.fr=""> wrote:
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> i386-pc-solaris2.10
>
> locale:
> C
>
> attached base packages:
> [1] "splines" "tools" "stats" "graphics" "grDevices"
"utils"
> [7] "datasets" "methods" "base"
>
> other attached packages:
> GOstats Category Matrix lattice genefilter
survival
> "2.2.6" "2.2.3" "0.99875-2" "0.15-5" "1.14.1"
"2.31"
> KEGG RBGL annotate Biobase GO
graph
> "1.16.1" "1.12.0" "1.14.1" "1.14.0" "1.16.0"
"1.14.2"
> hgu133plus2
> "1.16.0"
>
>
> Here, a *reproductible* example :
>
> require(hgu133plus2)
>
> ##UNIVERSE
> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
> ##GO
> haveGo <- sapply(mget(designALL, hgu133plus2GO),
> function(x) {
> if (length(x) == 1 && is.na(x)){
> FALSE
> }
> else{
> TRUE
> }
> })
> numNoGO <- sum(!haveGo)
> print(paste(numNoGO, "pbsets in Univers have no GO ids"))
> designALL<-designALL[haveGo]
>
> entrezUniverse <-
> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
> if (any(duplicated(entrezUniverse))){
> stop("error in gene universe: can't have duplicate Ent")
> }
>
> ##ALIST
> selectedEntrezIds <- entrezUniverse[1:150]
>
> require(GOstats)
>
> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,univ
erseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pval
ueCutoff=0.05,conditional=TRUE,testDirection="over")
> hgOver.BP<-hyperGTest(params.BP.over)
>
> length(selectedEntrezIds)
> length(hgOver.BP at geneIds)
>
> As you can see, i performed a non-specific filtering by removing
the
> probesets with no GO annotations (as presented in the vignette !).
> I'm not interested here by using an other filtering approach like
> Student test, IQR filtering, etc ...
>
> > length(selectedEntrezIds)
> [1] 150
> > length(hgOver.BP at geneIds)
> [1] 124
>
> To summarize, all the entrezID of my list have a GO annotation, but
the
> hyperGTest function removes 26 additional genes.
> I guess there are some explanations ?
All the entrezID of your list have a GO annotation; that does not
imply that they all have a GO BP annotation. So, your difference of
26 genes is probably due to the lack of annotation within the GO BP
ontology. I did not run your example to prove that is the case, but
it makes sense that the lists can, in general be different lengths. I
do not see a problem here.
Sean
> James W. MacDonald a ?crit :
> > Hi Nicolas,
> >
> > nicolas servant wrote:
> >> Sorry. I use the BioC GOStats package (2.2.6).
> >
> > Yes, but you *still* refuse to follow the posting guide, so we
are
> > left to guess at what you have done. Here's a hint; give us
(always)
> > the results of calling sessionInfo(), and give us a reproducible
> > example. The example you give below is not reproducible, as you
are
> > the only one who has these data. If we can't do what you have
done to
> > see what happened, how exactly are we to figure out why you get
these
> > results?
> >
> > And by reproducible example I don't mean for you to give us a
link to
> > your data. You can simply use set.seed(), randomly select some
affy
> > IDs, and then go on from there as if these were the IDs you got
from
> > an analysis.
> >
> >> In connection with my last mail, I have a other question :
> >>
> >> ##universe
> >> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
> >> entrezUniverse <-
> >> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
> >> ##myList
> >> selectedEntrezIds <-
> >> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
> >> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,u
niverseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",p
valueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,u
niverseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",p
valueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,u
niverseGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",p
valueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >>
> >> hgOver.BP<-hyperGTest(params.BP.over)
> >> hgOver.MF<-hyperGTest(params.MF.over)
> >> hgOver.CC<-hyperGTest(params.CC.over)
> >>
> >> length(selectedEntrezIds)
> >> [1] 3761
> >> length(hgOver.BP at geneIds)
> >> [1] 3122
> >>
> >> I guess the hyperGTest has removed these 639 missing genes. But
i
> >> don't understand why ?
> >
> > If you were to peruse the vignettes for the GOstats package, you
will
> > see that there are two filtering steps that you should take
before
> > running the analysis. The first is to subset to unique Entrez
Gene IDs
> > as you have done. The second is to remove those Entrez Gene IDs
that
> > have no GO annotations associated with them. My best guess is the
639
> > missing genes have no GO annotations. But I am guessing since I
can't
> > know.
> >
> > Best,
> >
> > Jim
> >
> >
> >>
> >> Nicolas
> >>
> >> Robert Gentleman a ?crit :
> >>> Hi Nicolas,
> >>>
> >>> Perhaps you could peruse the posting guide and provide the
> >>> information it asks you to. Only from that could one hope to
give
> >>> you a reasonable answer (ie, what commands did you run, what
what
> >>> the output, and your sessionInfo, at a bare minimum). And the
> >>> Bioconductor package is GOstats, not GOstat - that is something
else
> >>> (not Bioconductor) and if you want help with it you should ask
those
> >>> folks directly.
> >>>
> >>> Robert
> >>>
> >>>
> >>> nicolas servant wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a question about the GOstat package.
> >>>> I'm working with a list of probesets, and i'm interested by
testing
> >>>> the over representation of the GO terms in my list.
> >>>> So, I changed the probesets IDs (of my lists and my Universe)
into
> >>>> ENTREZ IDs and the hyperGTest performed well.
> >>>> For instance i have some result as :
> >>>>
> >>>> GO:0005635
> >>>> Pvalue = 0.04
> >>>> OddsRatio = 0.04
> >>>> ExpCount = 11
> >>>> Count = 17
> >>>> Size = 45
> >>>>
> >>>> But, when i did the opposite :
> >>>>
> >>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
> >>>> entrez <-
> >>>>
unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
> >>>> length(entrez)
> >>>> [1] 126
> >>>>
> >>>> I don't understand why I find 126 entrez IDs in the Universe,
and
> >>>> no 45 as expected (SIZE) ...
> >>>>
> >>>> mylistentrez<-unique(intersect(entrez,mylist))
> >>>> length(mylistentrez)
> >>>> [1] 50
> >>>>
> >>>> In the same way, I find 50 entrez IDs in my list, and no 17 as
> >>>> expected (COUNT)
> >>>>
> >>>> Thanks for your explanations,
> >>>> Bests,
> >>>>
> >>>> Nicolas
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
> --
> Nicolas Servant
> Equipe Bioinformatique
> Institut Curie
> 26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
>
> Email: Nicolas.Servant at curie.fr
> Tel: 01 53 10 70 55
> http://bioinfo.curie.fr/
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
On Fri, Mar 21, 2008 at 10:21 AM, nicolas servant
<nicolas.servant at="" curie.fr=""> wrote:
> Thanks for your comment ! I think you're right.
> Do you know how i can check that ? How can I differentiate the
> annotations (BP/CC/MF) ?
I think you can get all probes mapping to the BP ontology by doing:
bpprobes <- mget('GO:0008150',hgu133aGO2ALLPROBES)
The GO term 'GO:0008150' is the "root" for the BP ontology. For your
reference, 'GO:0005510' is the root for the CC ontology and
'GO:0003674' is the root for the MF ontology.
Hope that helps.
Sean