Hi Tim,
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> Hi,
>
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
That isn't actually what you want. oneGOGraph (which just calls
GOGraph) returns a graph with edges directed _from_ the keys in the
dataenv map (GOMFCHILDREN in your example) _to_ the values in the
dataenv map.
So in your example above, you will have edges from parent node to
child node. This is the reverse of how much of the GOstats code
usually thinks about GO DAGs -- the convention is to have edges go
from child to parent to indicate the is-a relationship.
So GOLeaves is making this assumption and along with taking a long
time. Now you could use reverseEdgeDirections to change the direction
of the edges of your graph, but this in itself will be somewhat slow
and GOLeaves will _still_ perform badly.
Instead, consider that with the graph you created, you are interested
in nodes that have no edges. So the following will give you all
leaves (and fairly quickly too):
> g1
A graphNEL graph with directed edges
Number of Nodes = 7527
Number of Edges = 8781
## count the number of (outgoing) edges for each node
> system.time(nKids <- listLen(edges(g1)))
user system elapsed
0.036 0.001 0.063
## get the names of the nodes that have no (outgoing) edges. These
## are the leaves
> system.time(leaves <- names(edges(g1)[nKids == 0]))
user system elapsed
0.035 0.000 0.037
> length(leaves)
[1] 6006
## verify
> allis.na(mget(leaves, GOMFCHILDREN)))
[1] TRUE
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
>
> Is there a similar function in some other package that would be
> quicker?
I will see about improving GOLeaves, but the above should get you
going for now...
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
Hi Tim,
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> Hi,
>
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
That isn't actually what you want. oneGOGraph (which just calls
GOGraph) returns a graph with edges directed _from_ the keys in the
dataenv map (GOMFCHILDREN in your example) _to_ the values in the
dataenv map.
So in your example above, you will have edges from parent node to
child node. This is the reverse of how much of the GOstats code
usually thinks about GO DAGs -- the convention is to have edges go
from child to parent to indicate the is-a relationship.
So GOLeaves is making this assumption and along with taking a long
time. Now you could use reverseEdgeDirections to change the direction
of the edges of your graph, but this in itself will be somewhat slow
and GOLeaves will _still_ perform badly.
Instead, consider that with the graph you created, you are interested
in nodes that have no edges. So the following will give you all
leaves (and fairly quickly too):
> g1
A graphNEL graph with directed edges
Number of Nodes = 7527
Number of Edges = 8781
## count the number of (outgoing) edges for each node
> system.time(nKids <- listLen(edges(g1)))
user system elapsed
0.036 0.001 0.063
## get the names of the nodes that have no (outgoing) edges. These
## are the leaves
> system.time(leaves <- names(edges(g1)[nKids == 0]))
user system elapsed
0.035 0.000 0.037
> length(leaves)
[1] 6006
## verify
> allis.na(mget(leaves, GOMFCHILDREN)))
[1] TRUE
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
>
> Is there a similar function in some other package that would be
> quicker?
I will see about improving GOLeaves, but the above should get you
going for now...
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
>
>I was trying to list all the leaf nodes for a particular ontology.
For this, I
was using the GOstats:
>
>g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
>g2 <- GOleaves(g1)
>
>Hopefully, this would give me a list of all the leaf nodes for the
molecular
function ontology. But this is taking too long to execute.
>
>Is there a similar function in some other package that would be
quicker?
Just for the GO ids:
>library(GO)
>leafGOs <- get("GO:0003674", GOMFCHILDREN)
>
>thanks!
>
>
>---------------------------------
>Pinpoint customers who are looking for what you sell.
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
Hi Tim,
Have you tried this?
> library(GO)
> isleaf <- unlist(eapply(GOBPCHILDREN, function(goid)
isTRUEis.na(goid))))
Now isleaf is a logical vector whose names are all the BP goids: for
each BP goid
it tells whether it is a leaf or not.
To put the BP leaves in a character vector:
> BPleaves <- names(isleaf)[isleaf]
Cheers,
H.
Tim Smith wrote:
> Hi,
>
> I was trying to list all the leaf nodes for a particular ontology.
For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
molecular function ontology. But this is taking too long to execute.
>
> Is there a similar function in some other package that would be
quicker?
>
> thanks!
>
>
> ---------------------------------
> Pinpoint customers who are looking for what you sell.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Tim,
Have you tried this?
> library(GO)
> isleaf <- unlist(eapply(GOBPCHILDREN, function(goid)
isTRUEis.na(goid))))
Now isleaf is a logical vector whose names are all the BP goids: for
each BP goid
it tells whether it is a leaf or not.
To put the BP leaves in a character vector:
> BPleaves <- names(isleaf)[isleaf]
Cheers,
H.
Tim Smith wrote:
> Hi,
>
> I was trying to list all the leaf nodes for a particular ontology.
For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
molecular function ontology. But this is taking too long to execute.
>
> Is there a similar function in some other package that would be
quicker?
>
> thanks!
>
>
> ---------------------------------
> Pinpoint customers who are looking for what you sell.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Thanks for the reply. I tried the following:
>
>t("GO:0008150", GOBPCHILDREN)
>> length(g1)
>[1] 12
>> g1
> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032"
> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004"
> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704"
>
>
>So, starting with the root node for biological process, I would want
to get
only the outermost leaf nodes (and not any intermediate nodes in the
graph).
>
>The above code would appear to give the direct children of the root
node (is
that correct?).
That is right. GOBPOFFSPRING will give you all the children nodes but
the node
structure is not preserved and there is no way to figure out which
ones are the
outmost.
You may need to travel through all the children nodes you get from the
CHILDREN
environment to get down to the outmost leaf nodes.
>
>many thanks
>
>John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote:
>>
>>I was trying to list all the leaf nodes for a particular ontology.
For this, I
>was using the GOstats:
>>
>>g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
>>g2 <- GOleaves(g1)
>>
>>Hopefully, this would give me a list of all the leaf nodes for the
molecular
>function ontology. But this is taking too long to execute.
>>
>>Is there a similar function in some other package that would be
quicker?
>
>Just for the GO ids:
>
>>library(GO)
>>leafGOs <- get("GO:0003674", GOMFCHILDREN)
>
>
>
>>
>>thanks!
>>
>>
>>---------------------------------
>>Pinpoint customers who are looking for what you sell.
>> [[alternative HTML version deleted]]
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Jianhua Zhang
>Department of Medical Oncology
>Dana-Farber Cancer Institute
>44 Binney Street
>Boston, MA 02115-6084
>
>
>
>
>---------------------------------
>Luggage? GPS? Comic books?
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
Did anyone try
GOLeaves from the GOstats package? It seems to be documented to do
what
is wanted...
John Zhang wrote:
>> Thanks for the reply. I tried the following:
>>
>> t("GO:0008150", GOBPCHILDREN)
>>> length(g1)
>> [1] 12
>>> g1
>> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032"
>> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004"
>> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704"
>>
>>
>> So, starting with the root node for biological process, I would
want to get
> only the outermost leaf nodes (and not any intermediate nodes in the
graph).
>> The above code would appear to give the direct children of the root
node (is
> that correct?).
>
> That is right. GOBPOFFSPRING will give you all the children nodes
but the node
> structure is not preserved and there is no way to figure out which
ones are the
> outmost.
>
> You may need to travel through all the children nodes you get from
the CHILDREN
> environment to get down to the outmost leaf nodes.
>
>
>> many thanks
>>
>> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote:
>>> I was trying to list all the leaf nodes for a particular ontology.
For this, I
>> was using the GOstats:
>>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
>>> g2 <- GOleaves(g1)
>>>
>>> Hopefully, this would give me a list of all the leaf nodes for the
molecular
>> function ontology. But this is taking too long to execute.
>>> Is there a similar function in some other package that would be
quicker?
>> Just for the GO ids:
>>
>>> library(GO)
>>> leafGOs <- get("GO:0003674", GOMFCHILDREN)
>>
>>
>>> thanks!
>>>
>>>
>>> ---------------------------------
>>> Pinpoint customers who are looking for what you sell.
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Jianhua Zhang
>> Department of Medical Oncology
>> Dana-Farber Cancer Institute
>> 44 Binney Street
>> Boston, MA 02115-6084
>>
>>
>>
>>
>> ---------------------------------
>> Luggage? GPS? Comic books?
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jianhua Zhang
> Department of Medical Oncology
> Dana-Farber Cancer Institute
> 44 Binney Street
> Boston, MA 02115-6084
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Did anyone try
GOLeaves from the GOstats package? It seems to be documented to do
what
is wanted...
John Zhang wrote:
>> Thanks for the reply. I tried the following:
>>
>> t("GO:0008150", GOBPCHILDREN)
>>> length(g1)
>> [1] 12
>>> g1
>> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032"
>> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004"
>> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704"
>>
>>
>> So, starting with the root node for biological process, I would
want to get
> only the outermost leaf nodes (and not any intermediate nodes in the
graph).
>> The above code would appear to give the direct children of the root
node (is
> that correct?).
>
> That is right. GOBPOFFSPRING will give you all the children nodes
but the node
> structure is not preserved and there is no way to figure out which
ones are the
> outmost.
>
> You may need to travel through all the children nodes you get from
the CHILDREN
> environment to get down to the outmost leaf nodes.
>
>
>> many thanks
>>
>> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote:
>>> I was trying to list all the leaf nodes for a particular ontology.
For this, I
>> was using the GOstats:
>>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
>>> g2 <- GOleaves(g1)
>>>
>>> Hopefully, this would give me a list of all the leaf nodes for the
molecular
>> function ontology. But this is taking too long to execute.
>>> Is there a similar function in some other package that would be
quicker?
>> Just for the GO ids:
>>
>>> library(GO)
>>> leafGOs <- get("GO:0003674", GOMFCHILDREN)
>>
>>
>>> thanks!
>>>
>>>
>>> ---------------------------------
>>> Pinpoint customers who are looking for what you sell.
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Jianhua Zhang
>> Department of Medical Oncology
>> Dana-Farber Cancer Institute
>> 44 Binney Street
>> Boston, MA 02115-6084
>>
>>
>>
>>
>> ---------------------------------
>> Luggage? GPS? Comic books?
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jianhua Zhang
> Department of Medical Oncology
> Dana-Farber Cancer Institute
> 44 Binney Street
> Boston, MA 02115-6084
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Did anyone try
GOLeaves from the GOstats package? It seems to be documented to do
what
is wanted...
John Zhang wrote:
>> Thanks for the reply. I tried the following:
>>
>> t("GO:0008150", GOBPCHILDREN)
>>> length(g1)
>> [1] 12
>>> g1
>> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032"
>> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004"
>> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704"
>>
>>
>> So, starting with the root node for biological process, I would
want to get
> only the outermost leaf nodes (and not any intermediate nodes in the
graph).
>> The above code would appear to give the direct children of the root
node (is
> that correct?).
>
> That is right. GOBPOFFSPRING will give you all the children nodes
but the node
> structure is not preserved and there is no way to figure out which
ones are the
> outmost.
>
> You may need to travel through all the children nodes you get from
the CHILDREN
> environment to get down to the outmost leaf nodes.
>
>
>> many thanks
>>
>> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote:
>>> I was trying to list all the leaf nodes for a particular ontology.
For this, I
>> was using the GOstats:
>>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
>>> g2 <- GOleaves(g1)
>>>
>>> Hopefully, this would give me a list of all the leaf nodes for the
molecular
>> function ontology. But this is taking too long to execute.
>>> Is there a similar function in some other package that would be
quicker?
>> Just for the GO ids:
>>
>>> library(GO)
>>> leafGOs <- get("GO:0003674", GOMFCHILDREN)
>>
>>
>>> thanks!
>>>
>>>
>>> ---------------------------------
>>> Pinpoint customers who are looking for what you sell.
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Jianhua Zhang
>> Department of Medical Oncology
>> Dana-Farber Cancer Institute
>> 44 Binney Street
>> Boston, MA 02115-6084
>>
>>
>>
>>
>> ---------------------------------
>> Luggage? GPS? Comic books?
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jianhua Zhang
> Department of Medical Oncology
> Dana-Farber Cancer Institute
> 44 Binney Street
> Boston, MA 02115-6084
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Hi Tim,
[I thought I sent a reply, but I didn't see it come through. So sorry
if this ends up being a dup]
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
oneGOGraph is a wrapper for GOGraph. GOGraph has the following
arguments: x, dataenv. The function builds a directed graph where
edges go from nodes that are keys in dataenv to nodes that are values
in dataenv.
So in your example, g1 will have edges going from parent to child GO
terms. It turns out that this is exactly the opposite of the
convention used in GOstats; edges in graphs representing GO point from
child to parent. One reason is that this is the way is-a
relationships are signified in UML. Upshot: GOLeaves, in addition to
taking forever, is not computing what you want.
You could use graph::reverseEdgeDirections on g1 and then call
GOLeaves. I think this will give you the right answer, but it will
still take forever (looks like GOLeaves needs to be sent to the
optimizer).
If you really are only interested in the leaves of the MF ontology,
then you just need to find the GO terms in GOMFCHILDREN that have no
children.
system.time(
isLeaf <- unlist(eapply(GOMFCHILDREN,
function(x) length(x) == 1 && is.na(x)))
)
user system elapsed
0.174 0.070 1.185
leaves <- names(isLeaf[isLeaf])
If you are interested in the leaves of a graph with edges going from
parent to child, like g1, then you can do:
numKids <- listLen(edges(g1))
leaves <- names(edges(g1)[numKids == 0])
This is fast for a graph the size of g1.
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
Hi Tim,
[I thought I sent a reply, but I didn't see it come through. So sorry
if this ends up being a dup]
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
oneGOGraph is a wrapper for GOGraph. GOGraph has the following
arguments: x, dataenv. The function builds a directed graph where
edges go from nodes that are keys in dataenv to nodes that are values
in dataenv.
So in your example, g1 will have edges going from parent to child GO
terms. It turns out that this is exactly the opposite of the
convention used in GOstats; edges in graphs representing GO point from
child to parent. One reason is that this is the way is-a
relationships are signified in UML. Upshot: GOLeaves, in addition to
taking forever, is not computing what you want.
You could use graph::reverseEdgeDirections on g1 and then call
GOLeaves. I think this will give you the right answer, but it will
still take forever (looks like GOLeaves needs to be sent to the
optimizer).
If you really are only interested in the leaves of the MF ontology,
then you just need to find the GO terms in GOMFCHILDREN that have no
children.
system.time(
isLeaf <- unlist(eapply(GOMFCHILDREN,
function(x) length(x) == 1 && is.na(x)))
)
user system elapsed
0.174 0.070 1.185
leaves <- names(isLeaf[isLeaf])
If you are interested in the leaves of a graph with edges going from
parent to child, like g1, then you can do:
numKids <- listLen(edges(g1))
leaves <- names(edges(g1)[numKids == 0])
This is fast for a graph the size of g1.
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
Hi,
[I thought I sent a reply, but I didn't see it come through. So sorry
if this ends up being a dup]
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
oneGOGraph is a wrapper for GOGraph. GOGraph has the following
arguments: x, dataenv. The function builds a directed graph where
edges go from nodes that are keys in dataenv to nodes that are values
in dataenv.
So in your example, g1 will have edges going from parent to child GO
terms. It turns out that this is exactly the opposite of the
convention used in GOstats; edges in graphs representing GO point from
child to parent. One reason is that this is the way is-a
relationships are signified in UML. Upshot: GOLeaves, in addition to
taking forever, is not computing what you want.
You could use graph::reverseEdgeDirections on g1 and then call
GOLeaves. I think this will give you the right answer, but it will
still take forever (looks like GOLeaves needs to be sent to the
optimizer).
If you really are only interested in the leaves of the MF ontology,
then you just need to find the GO terms in GOMFCHILDREN that have no
children.
system.time(
isLeaf <- unlist(eapply(GOMFCHILDREN,
function(x) length(x) == 1 && is.na(x)))
)
user system elapsed
0.174 0.070 1.185
leaves <- names(isLeaf[isLeaf])
If you are interested in the leaves of a graph with edges going from
parent to child, like g1, then you can do:
numKids <- listLen(edges(g1))
leaves <- names(edges(g1)[numKids == 0])
This is fast for a graph the size of g1.
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
Hi,
[I thought I sent a reply, but I didn't see it come through. So sorry
if this ends up being a dup]
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
oneGOGraph is a wrapper for GOGraph. GOGraph has the following
arguments: x, dataenv. The function builds a directed graph where
edges go from nodes that are keys in dataenv to nodes that are values
in dataenv.
So in your example, g1 will have edges going from parent to child GO
terms. It turns out that this is exactly the opposite of the
convention used in GOstats; edges in graphs representing GO point from
child to parent. One reason is that this is the way is-a
relationships are signified in UML. Upshot: GOLeaves, in addition to
taking forever, is not computing what you want.
You could use graph::reverseEdgeDirections on g1 and then call
GOLeaves. I think this will give you the right answer, but it will
still take forever (looks like GOLeaves needs to be sent to the
optimizer).
If you really are only interested in the leaves of the MF ontology,
then you just need to find the GO terms in GOMFCHILDREN that have no
children.
system.time(
isLeaf <- unlist(eapply(GOMFCHILDREN,
function(x) length(x) == 1 && is.na(x)))
)
user system elapsed
0.174 0.070 1.185
leaves <- names(isLeaf[isLeaf])
If you are interested in the leaves of a graph with edges going from
parent to child, like g1, then you can do:
numKids <- listLen(edges(g1))
leaves <- names(edges(g1)[numKids == 0])
This is fast for a graph the size of g1.
Best,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes:
> Hi Seth (and everyone),
>
> Thanks for the replies. I ended up doing (for the BP ontology):
>
> leaves <- list()
> root <- "GO:0008150"
> allgos <- get(root,GOBPOFFSPRING)
> for(i in 1:length(allgos)){
> ifis.na(get(allgos[i],GOBPOFFSPRING))){
> leaves <- c(leaves,allgos[i])
> }
> }
>
>
> I hope this is equivalent to getting the leaves!
I think this will get you the answer, but did you compare to the two
or
three other solutions suggested?
The for loop with get will be slower than using mget.
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/