Opened 14 years ago
Closed 6 years ago
#377 closed enhancement (done)
Smaller version of StaticDigraph
Reported by: | Peter Kovacs | Owned by: | Alpar Juttner |
---|---|---|---|
Priority: | major | Milestone: | LEMON 1.4 release |
Component: | core | Version: | hg main |
Keywords: | Cc: | ||
Revision id: |
Description
The current implementation of StaticDigraph
stores 2n+4m
int values (for n
nodes and m
arcs). However, in many cases, one needs outgoing arc iteration only. For such cases, a static graph representation could be implemented with n+2m
integers by fully omitting the outgoing and incomming arc lists (but arcs would be sorted by their source nodes).
The NodeIt
, ArcIt
and OutArcIt
iterators could be implemented efficiently, but the basic iteration of the ougoing arcs would be somewhat slower (see #68 for antecedents). For ensuring the compatibility with the Digraph
interface, an InArcIt
iterator should also be implemented, but it would be really slow (as slow as full arc iteration).
What do you think? Would this data structure make sense?
Attachments (3)
Change History (17)
comment:1 Changed 14 years ago by
Changed 8 years ago by
Attachment: | smaller_static_graph.patch added |
---|
Patch for a slightly smaller StaticDigraph?: 2n + 3m instead of 2n + 4m
comment:2 follow-up: 3 Changed 8 years ago by
Here is a first improvement that removes the node_first_out array from StaticDigraph?; I'm working on an n+m version as suggested by Alpar as well
comment:3 follow-up: 5 Changed 8 years ago by
Replying to ggouvin:
Here is a first improvement that removes the node_first_out array from StaticDigraph; ...
This change has been proposed and discussed several years ago. Here you can find that patch: 68-4-a3de05c56e7e.patch
We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of StaticDigraph
. See related comments here:
ticket:68#comment:5
ticket:68#comment:8
So I wouldn't apply this patch.
However, the n+m version is still interesting, but I would suggest to implement it as a separate Digraph
class. E.g. we could call it CompactDigraph
and its API doc should describe the differences compared to StaticDigraph
(much more space-efficient, but only OutArcIt
is fast, InArcIt
is much slower).
comment:4 Changed 8 years ago by
Here is a first working version of this Digraph implementation. I think it's almost OK, but I could improve a few points:
- Arc is 3 ints; it could be 2
- The documentation could be improved
comment:5 follow-up: 6 Changed 8 years ago by
Replying to kpeter:
Replying to ggouvin:
Here is a first improvement that removes the node_first_out array from StaticDigraph; ...
This change has been proposed and discussed several years ago. Here you can find that patch: 68-4-a3de05c56e7e.patch
We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of
StaticDigraph
. See related comments here: ticket:68#comment:5 ticket:68#comment:8So I wouldn't apply this patch.
However, the n+m version is still interesting, but I would suggest to implement it as a separate
Digraph
class. E.g. we could call itCompactDigraph
and its API doc should describe the differences compared toStaticDigraph
(much more space-efficient, but onlyOutArcIt
is fast,InArcIt
is much slower).
Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph? patch below (rather CompactDigraph??)? It still needs polishing.
ExtendedStaticDigraphBase::OutArcIt? is shadowed by StaticDigraph::OutArcIt?, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.
comment:6 follow-up: 7 Changed 8 years ago by
Replying to ggouvin:
Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.
I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node n
, node_first_out[n]
is actually the last one, not the first. And the last
field of Arc
type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether ++
or --
direction would be better for out arc iteration.
ExtendedStaticDigraphBase::OutArcIt
is shadowed byStaticDigraph::OutArcIt
, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.
I don't think so. Could you have a look at the implementation of graph adaptors? E.g. SubDigraph
class uses Parent::firstOut()
and Parent::nextOut()
in its firstOut()
and nextOut()
methods instead of the Parent graph's iterators:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452
Other adaptors (e.g. ReverseDigraph
) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600
comment:7 Changed 8 years ago by
Replying to kpeter:
Replying to ggouvin:
Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.
I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node
n
,node_first_out[n]
is actually the last one, not the first. And thelast
field ofArc
type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether++
or--
direction would be better for out arc iteration.
Right; I based it on ArcIt? rather than on StaticDigraph?'s OutArcIt?. I will change that to a more standard ++.
ExtendedStaticDigraphBase::OutArcIt
is shadowed byStaticDigraph::OutArcIt
, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.I don't think so. Could you have a look at the implementation of graph adaptors? E.g.
SubDigraph
class usesParent::firstOut()
andParent::nextOut()
in itsfirstOut()
andnextOut()
methods instead of the Parent graph's iterators: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452 Other adaptors (e.g.ReverseDigraph
) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600
Thanks for the insight, I didn't see this. One possible way out would be to have first/nextOut and the like take something other than the Arc class (OutArcItBase??), which would be typedef to Arc in most cases but would contain "last" in Static and CompactDigraph?. But this looks less and less like a minor change.
comment:8 Changed 8 years ago by
Yes, indeed. So I wouldn't modify StaticDigraph
: it focuses on running time performance (which is the primary preference in most cases).
The other implementation in turn focuses on memory requirements and provides fast iteration support only for the most typical use cases.
That's a clear trade-off, one can select according to her/his preference.
comment:9 Changed 8 years ago by
I updated the patch so CompactDigraph::OutArcIt? uses increasing id; ArcIt? is kept consistent with StaticDigraph? as well (decreasing id)
comment:10 Changed 8 years ago by
I briefly checked the attached patch. Here are some comments.
Handling of the last
field of Arc
still seems to be problematic. According to the usage of its value (lines 87, 110, 114), e.last
should always be equal to node_first_out[e.source+1]
, unless e.id==-1
. However, some assignments do not correspond to this invariant. The change in iteration order of OutArcIt
seems to be inconsistently updated (see lines 89, 98, 138, 216).
Anyway, this field is not necessary at all, recalculating its value would always be feasible.
My suggestion is:
- Entirely remove this field from
Arc
and calculate its value infirstOut()
andnextOut()
. - Compare the performance of this implementation and the previous one for various graphs (small/large, sparse/dense).
OutArcIt
might become slower, butArcIt
should be faster, because assignments oflast
and additional if statements would be saved there. - If
OutArcIt
becomes significantly slower, then we can re-introduce thelast
field, even as an optional field (or "cache") which is used only infirstOut()
andnextOut()
.
Apart from that, please use LEMON_COMPACT_GRAPH_H
instead of LEMON_FORWARD_GRAPH_H
in lines 19 and 20.
comment:11 Changed 8 years ago by
Thank you! It's cleaner. I updated the patch and benchmarked both. I ran Dijkstra on random graphs of various densities - it seemed closer to real use-cases - with 100 to 10000 nodes.
I used both ArcMap?<double> and ArcMap?<int> - the latter to test if aliasing issues would prevent some optimizations. I built the graph, then ran many Dijkstra iterations under oprofile.
- Performance is very close for StaticDigraph? and both versions of CompactDigraph? (+- 6% for all scenarios)
- Both CompactDigraph? versions are virtually identical
- Surprisingly, CompactDigraph? was less efficient on ArcMap?<int>, and more efficient on ArcMap?<double> (~3%)
comment:12 Changed 8 years ago by
Thanks for the update. I agree: if the 'last' field does not cause significant speed-up, then it could be removed.
I would accept this version, the patch could be applied to the main branch.
comment:13 Changed 6 years ago by
Milestone: | → LEMON 1.4 release |
---|
comment:14 Changed 6 years ago by
Resolution: | → done |
---|---|
Status: | new → closed |
The patch compact_graph.patch has been rebased as [73bd8d5200df] and pushed to the main branch.
Thanks for the contribution.
Replying to kpeter:
It may even be possible to implement n+m, at the cost that Arc is two times bigger.
Namely, the arc's are ordered according to their source node id, then we only store their target nodes (m integers) plus the id of the first outgoing arc for each node (n integer). The Arc structure must store the arc id and the source node id.
This is the easier (and probably more pragmatic) way. The other one is to introduce a new concept for out-iterable graphs.