slow graph methods
2
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Hi Paul -- I think Seth's speed-up came when he realized that the entire graph gets copied when a single node or edge is added. So his solution was to accumulate a bunch of changes (actually, the whole XML file) and then apply them in a single call to, e.g., addEdge using the fact that the arguments from, to can be vectors. Seth accumulated nodes / edges to be added in an environment, which does not get copied (the way a list would) each time it is changed. Maybe that perspective helps you to revise your code to add nodes in batches, rather than individually? Martin Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > A couple of weeks ago, just before he left, Seth made a striking (800x) > improvement to the speed of graph::fromGXL. This has been a big help. > > However, I'd like to construct graphs incrementally, and > programmatically, > without having to create new gxl files. Is it possible that > Seth's efficiencies have not yet been applied to other methods in the > graph class? > > In particular, it seems that the following operations take a longish > time > on graphs of a few thousand nodes: > > edgeData > nodeData > addEdge > addNode > > I could provide actual times and test cases if that would help. > > Thanks! > > - Paul > > > > On Sep 12, 2007, at 8:09 AM, Seth Falcon wrote: > >> Hi again, >> >> Seth Falcon <sfalcon at="" fhcrc.org=""> writes: >>> Paul Shannon <pshannon at="" systemsbiology.org=""> writes: >>>> It took nearly 24 hours (!) to create a 16k node graph using two >>>> different techniques: >>>> >>>> g = fromGXL (file ('someFile.gxl')) >> >> Using a patched version of fromGXL (on my laptop) I get: >> >>> library(graph) >>> con = file("kegg.yeast.gxl", open="r") >>> system.time(z <- fromGXL(con)) >> user system elapsed >> 104.366 0.570 105.070 >>> z >> A graphNEL graph with undirected edges >> Number of Nodes = 15158 >> Number of Edges = 32668 >>> validObject(z) >> [1] TRUE >> >> That's over 800x faster :-) >> >> I've checked in the changes in devel as graph 1.15.17. You can get it >> from svn or wait till this time tomorrow (in either case, you will >> need R 2.6.0 alpha). >> >> The code passes the existing unit tests, but some extra scrutiny that >> the returned graphs are as desired is quite welcome. >> >> + seth >> >> -- >> Seth Falcon | Computational Biology | Fred Hutchinson Cancer >> Research Center >> BioC: http://bioconductor.org/ >> Blog: http://userprimary.net/user/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org
SystemsBiology graph SystemsBiology graph • 1.2k views
ADD COMMENT
0
Entering edit mode
Paul Shannon ★ 1.1k
@paul-shannon-578
Last seen 10.3 years ago
Hi Martin, Great idea. I'll try it. Thanks! - Paul On Sep 26, 2007, at 2:16 PM, Martin Morgan wrote: > Seth accumulated nodes / edges to > be added in an environment, which does not get copied (the way a list > would) each time it is changed. Maybe that perspective helps you to > revise your code to add nodes in batches, rather than individually?
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.3 years ago
Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > Hi Martin, > > Great idea. I'll try it. > > Thanks! > > - Paul > > On Sep 26, 2007, at 2:16 PM, Martin Morgan wrote: > >> Seth accumulated nodes / edges to >> be added in an environment, which does not get copied (the way a list >> would) each time it is changed. Maybe that perspective helps you to >> revise your code to add nodes in batches, rather than individually? The code that I added for reading GXL files is in graph/R/gxlReader.R in the graph_handler function. You will have to "read through" the XML event parsing, but perhaps it will give you a more concrete example of using environments to accumulate data needed for one or two big calls to the graph functions and avoiding the incremental calls which are slow due to copying (as others have explained in this thread). Best, + seth -- Seth Falcon | seth at userprimary.net | blog: http://userprimary.net/user/
ADD COMMENT

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6