Finding GO leaf nodes for an ontology - which package?
11
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.3 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070727/ b19ed708/attachment.pl
• 3.0k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi Tim, Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > Hi, > > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) That isn't actually what you want. oneGOGraph (which just calls GOGraph) returns a graph with edges directed _from_ the keys in the dataenv map (GOMFCHILDREN in your example) _to_ the values in the dataenv map. So in your example above, you will have edges from parent node to child node. This is the reverse of how much of the GOstats code usually thinks about GO DAGs -- the convention is to have edges go from child to parent to indicate the is-a relationship. So GOLeaves is making this assumption and along with taking a long time. Now you could use reverseEdgeDirections to change the direction of the edges of your graph, but this in itself will be somewhat slow and GOLeaves will _still_ perform badly. Instead, consider that with the graph you created, you are interested in nodes that have no edges. So the following will give you all leaves (and fairly quickly too): > g1 A graphNEL graph with directed edges Number of Nodes = 7527 Number of Edges = 8781 ## count the number of (outgoing) edges for each node > system.time(nKids <- listLen(edges(g1))) user system elapsed 0.036 0.001 0.063 ## get the names of the nodes that have no (outgoing) edges. These ## are the leaves > system.time(leaves <- names(edges(g1)[nKids == 0])) user system elapsed 0.035 0.000 0.037 > length(leaves) [1] 6006 ## verify > allis.na(mget(leaves, GOMFCHILDREN))) [1] TRUE > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. > > Is there a similar function in some other package that would be > quicker? I will see about improving GOLeaves, but the above should get you going for now... Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi Tim, Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > Hi, > > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) That isn't actually what you want. oneGOGraph (which just calls GOGraph) returns a graph with edges directed _from_ the keys in the dataenv map (GOMFCHILDREN in your example) _to_ the values in the dataenv map. So in your example above, you will have edges from parent node to child node. This is the reverse of how much of the GOstats code usually thinks about GO DAGs -- the convention is to have edges go from child to parent to indicate the is-a relationship. So GOLeaves is making this assumption and along with taking a long time. Now you could use reverseEdgeDirections to change the direction of the edges of your graph, but this in itself will be somewhat slow and GOLeaves will _still_ perform badly. Instead, consider that with the graph you created, you are interested in nodes that have no edges. So the following will give you all leaves (and fairly quickly too): > g1 A graphNEL graph with directed edges Number of Nodes = 7527 Number of Edges = 8781 ## count the number of (outgoing) edges for each node > system.time(nKids <- listLen(edges(g1))) user system elapsed 0.036 0.001 0.063 ## get the names of the nodes that have no (outgoing) edges. These ## are the leaves > system.time(leaves <- names(edges(g1)[nKids == 0])) user system elapsed 0.035 0.000 0.037 > length(leaves) [1] 6006 ## verify > allis.na(mget(leaves, GOMFCHILDREN))) [1] TRUE > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. > > Is there a similar function in some other package that would be > quicker? I will see about improving GOLeaves, but the above should get you going for now... Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.3 years ago
> >I was trying to list all the leaf nodes for a particular ontology. For this, I was using the GOstats: > >g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) >g2 <- GOleaves(g1) > >Hopefully, this would give me a list of all the leaf nodes for the molecular function ontology. But this is taking too long to execute. > >Is there a similar function in some other package that would be quicker? Just for the GO ids: >library(GO) >leafGOs <- get("GO:0003674", GOMFCHILDREN) > >thanks! > > >--------------------------------- >Pinpoint customers who are looking for what you sell. > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070727/ afef7e38/attachment.pl
ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi Tim, Have you tried this? > library(GO) > isleaf <- unlist(eapply(GOBPCHILDREN, function(goid) isTRUEis.na(goid)))) Now isleaf is a logical vector whose names are all the BP goids: for each BP goid it tells whether it is a leaf or not. To put the BP leaves in a character vector: > BPleaves <- names(isleaf)[isleaf] Cheers, H. Tim Smith wrote: > Hi, > > I was trying to list all the leaf nodes for a particular ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the molecular function ontology. But this is taking too long to execute. > > Is there a similar function in some other package that would be quicker? > > thanks! > > > --------------------------------- > Pinpoint customers who are looking for what you sell. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi Tim, Have you tried this? > library(GO) > isleaf <- unlist(eapply(GOBPCHILDREN, function(goid) isTRUEis.na(goid)))) Now isleaf is a logical vector whose names are all the BP goids: for each BP goid it tells whether it is a leaf or not. To put the BP leaves in a character vector: > BPleaves <- names(isleaf)[isleaf] Cheers, H. Tim Smith wrote: > Hi, > > I was trying to list all the leaf nodes for a particular ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the molecular function ontology. But this is taking too long to execute. > > Is there a similar function in some other package that would be quicker? > > thanks! > > > --------------------------------- > Pinpoint customers who are looking for what you sell. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.3 years ago
> >Thanks for the reply. I tried the following: > >t("GO:0008150", GOBPCHILDREN) >> length(g1) >[1] 12 >> g1 > [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032" > [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004" > [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704" > > >So, starting with the root node for biological process, I would want to get only the outermost leaf nodes (and not any intermediate nodes in the graph). > >The above code would appear to give the direct children of the root node (is that correct?). That is right. GOBPOFFSPRING will give you all the children nodes but the node structure is not preserved and there is no way to figure out which ones are the outmost. You may need to travel through all the children nodes you get from the CHILDREN environment to get down to the outmost leaf nodes. > >many thanks > >John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: >> >>I was trying to list all the leaf nodes for a particular ontology. For this, I >was using the GOstats: >> >>g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) >>g2 <- GOleaves(g1) >> >>Hopefully, this would give me a list of all the leaf nodes for the molecular >function ontology. But this is taking too long to execute. >> >>Is there a similar function in some other package that would be quicker? > >Just for the GO ids: > >>library(GO) >>leafGOs <- get("GO:0003674", GOMFCHILDREN) > > > >> >>thanks! >> >> >>--------------------------------- >>Pinpoint customers who are looking for what you sell. >> [[alternative HTML version deleted]] >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >Jianhua Zhang >Department of Medical Oncology >Dana-Farber Cancer Institute >44 Binney Street >Boston, MA 02115-6084 > > > > >--------------------------------- >Luggage? GPS? Comic books? > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT
0
Entering edit mode
Did anyone try GOLeaves from the GOstats package? It seems to be documented to do what is wanted... John Zhang wrote: >> Thanks for the reply. I tried the following: >> >> t("GO:0008150", GOBPCHILDREN) >>> length(g1) >> [1] 12 >>> g1 >> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032" >> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004" >> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704" >> >> >> So, starting with the root node for biological process, I would want to get > only the outermost leaf nodes (and not any intermediate nodes in the graph). >> The above code would appear to give the direct children of the root node (is > that correct?). > > That is right. GOBPOFFSPRING will give you all the children nodes but the node > structure is not preserved and there is no way to figure out which ones are the > outmost. > > You may need to travel through all the children nodes you get from the CHILDREN > environment to get down to the outmost leaf nodes. > > >> many thanks >> >> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: >>> I was trying to list all the leaf nodes for a particular ontology. For this, I >> was using the GOstats: >>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) >>> g2 <- GOleaves(g1) >>> >>> Hopefully, this would give me a list of all the leaf nodes for the molecular >> function ontology. But this is taking too long to execute. >>> Is there a similar function in some other package that would be quicker? >> Just for the GO ids: >> >>> library(GO) >>> leafGOs <- get("GO:0003674", GOMFCHILDREN) >> >> >>> thanks! >>> >>> >>> --------------------------------- >>> Pinpoint customers who are looking for what you sell. >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> >> >> >> >> --------------------------------- >> Luggage? GPS? Comic books? >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Did anyone try GOLeaves from the GOstats package? It seems to be documented to do what is wanted... John Zhang wrote: >> Thanks for the reply. I tried the following: >> >> t("GO:0008150", GOBPCHILDREN) >>> length(g1) >> [1] 12 >>> g1 >> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032" >> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004" >> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704" >> >> >> So, starting with the root node for biological process, I would want to get > only the outermost leaf nodes (and not any intermediate nodes in the graph). >> The above code would appear to give the direct children of the root node (is > that correct?). > > That is right. GOBPOFFSPRING will give you all the children nodes but the node > structure is not preserved and there is no way to figure out which ones are the > outmost. > > You may need to travel through all the children nodes you get from the CHILDREN > environment to get down to the outmost leaf nodes. > > >> many thanks >> >> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: >>> I was trying to list all the leaf nodes for a particular ontology. For this, I >> was using the GOstats: >>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) >>> g2 <- GOleaves(g1) >>> >>> Hopefully, this would give me a list of all the leaf nodes for the molecular >> function ontology. But this is taking too long to execute. >>> Is there a similar function in some other package that would be quicker? >> Just for the GO ids: >> >>> library(GO) >>> leafGOs <- get("GO:0003674", GOMFCHILDREN) >> >> >>> thanks! >>> >>> >>> --------------------------------- >>> Pinpoint customers who are looking for what you sell. >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> >> >> >> >> --------------------------------- >> Luggage? GPS? Comic books? >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Did anyone try GOLeaves from the GOstats package? It seems to be documented to do what is wanted... John Zhang wrote: >> Thanks for the reply. I tried the following: >> >> t("GO:0008150", GOBPCHILDREN) >>> length(g1) >> [1] 12 >>> g1 >> [1] "GO:0050789" "GO:0050896" "GO:0009987" "GO:0016032" >> [5] "GO:0043473" "GO:0019952" "GO:0000003" "GO:0000004" >> [9] "GO:0040007" "GO:0007275" "GO:0007582" "GO:0051704" >> >> >> So, starting with the root node for biological process, I would want to get > only the outermost leaf nodes (and not any intermediate nodes in the graph). >> The above code would appear to give the direct children of the root node (is > that correct?). > > That is right. GOBPOFFSPRING will give you all the children nodes but the node > structure is not preserved and there is no way to figure out which ones are the > outmost. > > You may need to travel through all the children nodes you get from the CHILDREN > environment to get down to the outmost leaf nodes. > > >> many thanks >> >> John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: >>> I was trying to list all the leaf nodes for a particular ontology. For this, I >> was using the GOstats: >>> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) >>> g2 <- GOleaves(g1) >>> >>> Hopefully, this would give me a list of all the leaf nodes for the molecular >> function ontology. But this is taking too long to execute. >>> Is there a similar function in some other package that would be quicker? >> Just for the GO ids: >> >>> library(GO) >>> leafGOs <- get("GO:0003674", GOMFCHILDREN) >> >> >>> thanks! >>> >>> >>> --------------------------------- >>> Pinpoint customers who are looking for what you sell. >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> >> >> >> >> --------------------------------- >> Luggage? GPS? Comic books? >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi Tim, [I thought I sent a reply, but I didn't see it come through. So sorry if this ends up being a dup] Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. oneGOGraph is a wrapper for GOGraph. GOGraph has the following arguments: x, dataenv. The function builds a directed graph where edges go from nodes that are keys in dataenv to nodes that are values in dataenv. So in your example, g1 will have edges going from parent to child GO terms. It turns out that this is exactly the opposite of the convention used in GOstats; edges in graphs representing GO point from child to parent. One reason is that this is the way is-a relationships are signified in UML. Upshot: GOLeaves, in addition to taking forever, is not computing what you want. You could use graph::reverseEdgeDirections on g1 and then call GOLeaves. I think this will give you the right answer, but it will still take forever (looks like GOLeaves needs to be sent to the optimizer). If you really are only interested in the leaves of the MF ontology, then you just need to find the GO terms in GOMFCHILDREN that have no children. system.time( isLeaf <- unlist(eapply(GOMFCHILDREN, function(x) length(x) == 1 && is.na(x))) ) user system elapsed 0.174 0.070 1.185 leaves <- names(isLeaf[isLeaf]) If you are interested in the leaves of a graph with edges going from parent to child, like g1, then you can do: numKids <- listLen(edges(g1)) leaves <- names(edges(g1)[numKids == 0]) This is fast for a graph the size of g1. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi Tim, [I thought I sent a reply, but I didn't see it come through. So sorry if this ends up being a dup] Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. oneGOGraph is a wrapper for GOGraph. GOGraph has the following arguments: x, dataenv. The function builds a directed graph where edges go from nodes that are keys in dataenv to nodes that are values in dataenv. So in your example, g1 will have edges going from parent to child GO terms. It turns out that this is exactly the opposite of the convention used in GOstats; edges in graphs representing GO point from child to parent. One reason is that this is the way is-a relationships are signified in UML. Upshot: GOLeaves, in addition to taking forever, is not computing what you want. You could use graph::reverseEdgeDirections on g1 and then call GOLeaves. I think this will give you the right answer, but it will still take forever (looks like GOLeaves needs to be sent to the optimizer). If you really are only interested in the leaves of the MF ontology, then you just need to find the GO terms in GOMFCHILDREN that have no children. system.time( isLeaf <- unlist(eapply(GOMFCHILDREN, function(x) length(x) == 1 && is.na(x))) ) user system elapsed 0.174 0.070 1.185 leaves <- names(isLeaf[isLeaf]) If you are interested in the leaves of a graph with edges going from parent to child, like g1, then you can do: numKids <- listLen(edges(g1)) leaves <- names(edges(g1)[numKids == 0]) This is fast for a graph the size of g1. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi, [I thought I sent a reply, but I didn't see it come through. So sorry if this ends up being a dup] Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. oneGOGraph is a wrapper for GOGraph. GOGraph has the following arguments: x, dataenv. The function builds a directed graph where edges go from nodes that are keys in dataenv to nodes that are values in dataenv. So in your example, g1 will have edges going from parent to child GO terms. It turns out that this is exactly the opposite of the convention used in GOstats; edges in graphs representing GO point from child to parent. One reason is that this is the way is-a relationships are signified in UML. Upshot: GOLeaves, in addition to taking forever, is not computing what you want. You could use graph::reverseEdgeDirections on g1 and then call GOLeaves. I think this will give you the right answer, but it will still take forever (looks like GOLeaves needs to be sent to the optimizer). If you really are only interested in the leaves of the MF ontology, then you just need to find the GO terms in GOMFCHILDREN that have no children. system.time( isLeaf <- unlist(eapply(GOMFCHILDREN, function(x) length(x) == 1 && is.na(x))) ) user system elapsed 0.174 0.070 1.185 leaves <- names(isLeaf[isLeaf]) If you are interested in the leaves of a graph with edges going from parent to child, like g1, then you can do: numKids <- listLen(edges(g1)) leaves <- names(edges(g1)[numKids == 0]) This is fast for a graph the size of g1. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070730/ 2e6bb9a2/attachment.pl
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Hi, [I thought I sent a reply, but I didn't see it come through. So sorry if this ends up being a dup] Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > I was trying to list all the leaf nodes for a particular > ontology. For this, I was using the GOstats: > > g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN) > g2 <- GOleaves(g1) > > Hopefully, this would give me a list of all the leaf nodes for the > molecular function ontology. But this is taking too long to execute. oneGOGraph is a wrapper for GOGraph. GOGraph has the following arguments: x, dataenv. The function builds a directed graph where edges go from nodes that are keys in dataenv to nodes that are values in dataenv. So in your example, g1 will have edges going from parent to child GO terms. It turns out that this is exactly the opposite of the convention used in GOstats; edges in graphs representing GO point from child to parent. One reason is that this is the way is-a relationships are signified in UML. Upshot: GOLeaves, in addition to taking forever, is not computing what you want. You could use graph::reverseEdgeDirections on g1 and then call GOLeaves. I think this will give you the right answer, but it will still take forever (looks like GOLeaves needs to be sent to the optimizer). If you really are only interested in the leaves of the MF ontology, then you just need to find the GO terms in GOMFCHILDREN that have no children. system.time( isLeaf <- unlist(eapply(GOMFCHILDREN, function(x) length(x) == 1 && is.na(x))) ) user system elapsed 0.174 0.070 1.185 leaves <- names(isLeaf[isLeaf]) If you are interested in the leaves of a graph with edges going from parent to child, like g1, then you can do: numKids <- listLen(edges(g1)) leaves <- names(edges(g1)[numKids == 0]) This is fast for a graph the size of g1. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Tim Smith <tim_smith_666 at="" yahoo.com=""> writes: > Hi Seth (and everyone), > > Thanks for the replies. I ended up doing (for the BP ontology): > > leaves <- list() > root <- "GO:0008150" > allgos <- get(root,GOBPOFFSPRING) > for(i in 1:length(allgos)){ > ifis.na(get(allgos[i],GOBPOFFSPRING))){ > leaves <- c(leaves,allgos[i]) > } > } > > > I hope this is equivalent to getting the leaves! I think this will get you the answer, but did you compare to the two or three other solutions suggested? The for loop with get will be slower than using mget. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6