Hi list,
Could someone please help me understand the differences between the
(hgu133plus2)GO, GO2PROBE, GO2ALLPROBES? I've found discepancies that
I
can't quite explain:
> mget("GO:0042611", hgu133plus2GO2PROBE)
Error: value for 'GO:0042611' not found
> mget("GO:0042611", hgu133plus2GO2ALLPROBES)
$"GO:0042611"
<na> IEA IEA IEA
<na>
"209309_at" "217014_s_at" "210325_at" "218831_s_at"
"1553402_a_at"
<na> <na> <na> <na>
<na>
"206086_x_at" "206087_x_at" "210864_x_at" "211326_x_at"
"211327_x_at"
<na> <na> <na> <na>
<na>
"211328_x_at" "211329_x_at" "211330_s_at" "211331_x_at"
"211332_x_at"
<na> <na> <na> IEA
<na>
"211863_x_at" "211866_x_at" "214647_s_at" "235754_at"
"213932_x_at"
IEA <na> <na> IEA
<na>
"215313_x_at" "208729_x_at" "209140_x_at" "211911_x_at"
"208812_x_at"
<na> <na> IEA <na>
<na>
"211799_x_at" "214459_x_at" "216526_x_at" "200904_at"
"200905_x_at"
IEA <na> <na> IEA
<na>
"217456_x_at" "204806_x_at" "221875_x_at" "221978_at"
"210514_x_at"
<na> <na> <na> IEA
IEA
"211528_x_at" "211529_x_at" "211530_x_at" "217436_x_at"
"231748_at"
<na> IEA IEA IEA
"221291_at" "238542_at" "221323_at" "1552777_a_at"
and finally...
### "208729_x_at" is one of the probes returned with the above command
> grep("GO:0042611",unlist(mget("208729_x_at", hgu133plus2GO)))
numeric(0)
"208729_x_at" is on the hgu133plus2 chip, but GO and GO2ALLPROBES
don't
map it to the same GO ID.
Is there something wrong here or am I just missing something? If
different, which is the most "reliable" mapping? I'm concerned
because
I went through to validate GO IDs I had gotten from the GOHyperG
function (a total of 314), and 117 of those I could not map back to my
significant probe list using the hgu133plus2GO annotation. I noticed
by
looking at the GOHyperG function that it uses information from
GO2ALLPROBES.
Any help/enlightenment is much appreciated.
PS - using R 2.2.1 with hgu133plus2 1.10.0
--Jake
Hi Jake,
Jake <jjmichael at="" comcast.net=""> writes:
> Could someone please help me understand the differences between the
> (hgu133plus2)GO, GO2PROBE, GO2ALLPROBES? I've found discepancies
that I
> can't quite explain:
>
> > mget("GO:0042611", hgu133plus2GO2PROBE)
> Error: value for 'GO:0042611' not found
GO annotates probe ids (really Entrez Gene ids) at the most specific
term in the GO ontology. In the above search of hgu133plus2GO2PROBE,
you are seeing that GO:0042611 does not have any annotations.
>> mget("GO:0042611", hgu133plus2GO2ALLPROBES)
> $"GO:0042611"
> <na> IEA IEA IEA
> <na>
> "209309_at" "217014_s_at" "210325_at" "218831_s_at"
[snip]
For a given GO term, the hgu133plus2GO2ALLPROBES environment is giving
you all Affy ids that map to this GO term _or_ a more specific term
that is related to this term (by related, I mean child-like relation,
where there is a path in the DAG connecting the terms).
The names on the vector are evidence codes. See the man pages for
details.
So for the above two cases, this is as expected and I don't think
there is any inconsistency.
> and finally...
>
> ### "208729_x_at" is one of the probes returned with the above
command
>> grep("GO:0042611",unlist(mget("208729_x_at", hgu133plus2GO)))
> numeric(0)
When you say "above command", which one are you referring to?
hgu133plus2GO should be the inverse map for hgu133plus2GO2PROBE.
> "208729_x_at" is on the hgu133plus2 chip, but GO and GO2ALLPROBES
don't
> map it to the same GO ID.
Can you be more specific? Which env in the GO package are you talking
about. Note that GO2ALLPROBES does not map to GO ids, it maps _from_
GO ids.
You can ask which GO ids have the 208729_x_at annotation using
hgu133plus2GO.
If you then grep through hgu133plus2GO2ALLPROBES for GO ids that have
208729_x_at in their probe vector, then you should find more GO ids
because you are picking up parent terms that don't have the specific
annotation. However, all the ids you found in hgu133plus2GO should
appear.
Clear as mud? :-)
> Is there something wrong here or am I just missing something? If
> different, which is the most "reliable" mapping? I'm concerned
because
> I went through to validate GO IDs I had gotten from the GOHyperG
> function (a total of 314), and 117 of those I could not map back to
my
> significant probe list using the hgu133plus2GO annotation. I
noticed by
> looking at the GOHyperG function that it uses information from
> GO2ALLPROBES.
>
> Any help/enlightenment is much appreciated.
>
> PS - using R 2.2.1 with hgu133plus2 1.10.0
PS: sessionInfo() would be a better way to report that. Then we would
also know your version of the GO package, for example.
+ seth
On Tue, 2006-04-18 at 09:37 -0700, Seth Falcon wrote:
> Hi Jake,
>
> Jake <jjmichael at="" comcast.net=""> writes:
> > Could someone please help me understand the differences between
the
> > (hgu133plus2)GO, GO2PROBE, GO2ALLPROBES? I've found discepancies
that I
> > can't quite explain:
> >
> > > mget("GO:0042611", hgu133plus2GO2PROBE)
> > Error: value for 'GO:0042611' not found
>
> GO annotates probe ids (really Entrez Gene ids) at the most specific
> term in the GO ontology. In the above search of
hgu133plus2GO2PROBE,
> you are seeing that GO:0042611 does not have any annotations.
>
>
> >> mget("GO:0042611", hgu133plus2GO2ALLPROBES)
> > $"GO:0042611"
> > <na> IEA IEA IEA
> > <na>
> > "209309_at" "217014_s_at" "210325_at" "218831_s_at"
> [snip]
>
> For a given GO term, the hgu133plus2GO2ALLPROBES environment is
giving
> you all Affy ids that map to this GO term _or_ a more specific term
> that is related to this term (by related, I mean child-like
relation,
> where there is a path in the DAG connecting the terms).
>
> The names on the vector are evidence codes. See the man pages for
> details.
>
> So for the above two cases, this is as expected and I don't think
> there is any inconsistency.
>
> > and finally...
> >
> > ### "208729_x_at" is one of the probes returned with the above
command
> >> grep("GO:0042611",unlist(mget("208729_x_at", hgu133plus2GO)))
> > numeric(0)
>
> When you say "above command", which one are you referring to?
> hgu133plus2GO should be the inverse map for hgu133plus2GO2PROBE.
>
> > "208729_x_at" is on the hgu133plus2 chip, but GO and GO2ALLPROBES
don't
> > map it to the same GO ID.
>
> Can you be more specific? Which env in the GO package are you
talking
> about. Note that GO2ALLPROBES does not map to GO ids, it maps
_from_
> GO ids.
>
> You can ask which GO ids have the 208729_x_at annotation using
> hgu133plus2GO.
>
> If you then grep through hgu133plus2GO2ALLPROBES for GO ids that
have
> 208729_x_at in their probe vector, then you should find more GO ids
> because you are picking up parent terms that don't have the specific
> annotation. However, all the ids you found in hgu133plus2GO should
> appear.
>
> Clear as mud? :-)
>
> > Is there something wrong here or am I just missing something? If
> > different, which is the most "reliable" mapping? I'm concerned
because
> > I went through to validate GO IDs I had gotten from the GOHyperG
> > function (a total of 314), and 117 of those I could not map back
to my
> > significant probe list using the hgu133plus2GO annotation. I
noticed by
> > looking at the GOHyperG function that it uses information from
> > GO2ALLPROBES.
> >
> > Any help/enlightenment is much appreciated.
> >
> > PS - using R 2.2.1 with hgu133plus2 1.10.0
>
> PS: sessionInfo() would be a better way to report that. Then we
would
> also know your version of the GO package, for example.
>
> + seth
Thanks for all the help, guys - really helped my understanding as to
how
the GO mappings work in the context of BioC. I had previously assumed
that mappings in all the GO environments were multi-level, and now I
know that really on the GO2ALLPROBES environment is.
Jim- sorry for personally replying to you -meant to send to the list
but
I frequently hit "reply" instead of "reply to all" on accident.
--Jake
Hi Jake,
Jake wrote:
> Hi list,
>
> Could someone please help me understand the differences between the
> (hgu133plus2)GO, GO2PROBE, GO2ALLPROBES? I've found discepancies
that I
> can't quite explain:
>
> > mget("GO:0042611", hgu133plus2GO2PROBE)
> Error: value for 'GO:0042611' not found
>
>
>>mget("GO:0042611", hgu133plus2GO2ALLPROBES)
>
> $"GO:0042611"
> <na> IEA IEA IEA
> <na>
> "209309_at" "217014_s_at" "210325_at" "218831_s_at"
> "1553402_a_at"
> <na> <na> <na> <na>
> <na>
> "206086_x_at" "206087_x_at" "210864_x_at" "211326_x_at"
> "211327_x_at"
> <na> <na> <na> <na>
> <na>
> "211328_x_at" "211329_x_at" "211330_s_at" "211331_x_at"
> "211332_x_at"
> <na> <na> <na> IEA
> <na>
> "211863_x_at" "211866_x_at" "214647_s_at" "235754_at"
> "213932_x_at"
> IEA <na> <na> IEA
> <na>
> "215313_x_at" "208729_x_at" "209140_x_at" "211911_x_at"
> "208812_x_at"
> <na> <na> IEA <na>
> <na>
> "211799_x_at" "214459_x_at" "216526_x_at" "200904_at"
> "200905_x_at"
> IEA <na> <na> IEA
> <na>
> "217456_x_at" "204806_x_at" "221875_x_at" "221978_at"
> "210514_x_at"
> <na> <na> <na> IEA
> IEA
> "211528_x_at" "211529_x_at" "211530_x_at" "217436_x_at"
> "231748_at"
> <na> IEA IEA IEA
> "221291_at" "238542_at" "221323_at" "1552777_a_at"
>
> and finally...
>
> ### "208729_x_at" is one of the probes returned with the above
command
>
>>grep("GO:0042611",unlist(mget("208729_x_at", hgu133plus2GO)))
>
> numeric(0)
>
>
>
> "208729_x_at" is on the hgu133plus2 chip, but GO and GO2ALLPROBES
don't
> map it to the same GO ID.
>
> Is there something wrong here or am I just missing something? If
> different, which is the most "reliable" mapping? I'm concerned
because
> I went through to validate GO IDs I had gotten from the GOHyperG
> function (a total of 314), and 117 of those I could not map back to
my
> significant probe list using the hgu133plus2GO annotation. I
noticed by
> looking at the GOHyperG function that it uses information from
> GO2ALLPROBES.
Here is the difference:
hgu133plus2GO maps Probe IDs to GO terms
hgu133plus2GO2 PROBE maps GO terms to Probe IDs
hgu133plus2GO2ALLPROBES maps GO terms and all children of the terms to
Probe IDs
So there isn't really an issue of reliability here, just an issue of
what you want. In your case, 208729_x_at doesn't map to GO:0042611,
but
it does map to children of that GO term (for instance GO:0042612).
sapply(get("208729_x_at", hgu133plus2GO), function(x) x[[1]])
GO:0005624 GO:0005887 GO:0016020 GO:0016021 GO:0019882
GO:0019883
"GO:0005624" "GO:0005887" "GO:0016020" "GO:0016021" "GO:0019882"
"GO:0019883"
GO:0019885 GO:0030106 GO:0030106 GO:0042612
"GO:0019885" "GO:0030106" "GO:0030106" "GO:0042612"
> grep("208729_x_at",get("GO:0042612", hgu133plus2GO2PROBE))
[1] 20
> grep("208729_x_at",get("GO:0042611", hgu133plus2GO2PROBE))
Error in get(x, envir, mode, inherits) : variable "GO:0042611" was not
found
> grep("208729_x_at",get("GO:0042611", hgu133plus2GO2ALLPROBES))
[1] 20
HTH,
Jim
>
> Any help/enlightenment is much appreciated.
>
> PS - using R 2.2.1 with hgu133plus2 1.10.0
>
> --Jake
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
Jake wrote:
>>Here is the difference:
>>
>>hgu133plus2GO maps Probe IDs to GO terms
>>hgu133plus2GO2 PROBE maps GO terms to Probe IDs
>>hgu133plus2GO2ALLPROBES maps GO terms and all children of the terms
to
>>Probe IDs
>
>
> Thanks for the quick response, Jim. I just want to make sure that
I'm
> understanding this correctly:
>
> The "children" are more specific descriptions/functions of the
"parent"
> node (right?). So are you saying that even if an Affy Probe ID only
has
> evidence for a given parent node, GO2ALLPROBES will also include
> connected children nodes for which there is no evidence for that
Affy
> ID?
Nope, you have that backwards. GO2ALLPROBES maps GO terms to AffyIDs.
So
if you have a GO term, say phosphorylation, and there aren't any
AffyIDs
that map to that particular GOID, GO2PROBE won't list anything.
However,
if there is an AffyID that maps to protein phosphorylation, which is a
child term of phosphorylation, then GO2ALLPROBES will list that AffyID
when you do a get() on the phosphorylation GOID.
What you are talking about is the mapping in the hgu133plus2GO
environment. In that case, if a given AffyID maps to e.g.,
phosphorylation, that is the only GOID that will be returned, not that
term and all its children.
As an aside, please don't respond just to me. Keep things on the list
so
the questions/answers can be found by others.
HTH,
Jim
>
> Thanks,
>
> Jake
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.