Entering edit mode
Hi Paul --
I think Seth's speed-up came when he realized that the entire graph
gets copied when a single node or edge is added. So his solution was
to
accumulate a bunch of changes (actually, the whole XML file) and then
apply them in a single call to, e.g., addEdge using the fact that the
arguments from, to can be vectors. Seth accumulated nodes / edges to
be added in an environment, which does not get copied (the way a list
would) each time it is changed. Maybe that perspective helps you to
revise your code to add nodes in batches, rather than individually?
Martin
Paul Shannon <pshannon at="" systemsbiology.org=""> writes:
> A couple of weeks ago, just before he left, Seth made a striking
(800x)
> improvement to the speed of graph::fromGXL. This has been a big
help.
>
> However, I'd like to construct graphs incrementally, and
> programmatically,
> without having to create new gxl files. Is it possible that
> Seth's efficiencies have not yet been applied to other methods in
the
> graph class?
>
> In particular, it seems that the following operations take a longish
> time
> on graphs of a few thousand nodes:
>
> edgeData
> nodeData
> addEdge
> addNode
>
> I could provide actual times and test cases if that would help.
>
> Thanks!
>
> - Paul
>
>
>
> On Sep 12, 2007, at 8:09 AM, Seth Falcon wrote:
>
>> Hi again,
>>
>> Seth Falcon <sfalcon at="" fhcrc.org=""> writes:
>>> Paul Shannon <pshannon at="" systemsbiology.org=""> writes:
>>>> It took nearly 24 hours (!) to create a 16k node graph using two
>>>> different techniques:
>>>>
>>>> g = fromGXL (file ('someFile.gxl'))
>>
>> Using a patched version of fromGXL (on my laptop) I get:
>>
>>> library(graph)
>>> con = file("kegg.yeast.gxl", open="r")
>>> system.time(z <- fromGXL(con))
>> user system elapsed
>> 104.366 0.570 105.070
>>> z
>> A graphNEL graph with undirected edges
>> Number of Nodes = 15158
>> Number of Edges = 32668
>>> validObject(z)
>> [1] TRUE
>>
>> That's over 800x faster :-)
>>
>> I've checked in the changes in devel as graph 1.15.17. You can get
it
>> from svn or wait till this time tomorrow (in either case, you will
>> need R 2.6.0 alpha).
>>
>> The code passes the existing unit tests, but some extra scrutiny
that
>> the returned graphs are as desired is quite welcome.
>>
>> + seth
>>
>> --
>> Seth Falcon | Computational Biology | Fred Hutchinson Cancer
>> Research Center
>> BioC: http://bioconductor.org/
>> Blog: http://userprimary.net/user/
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org