Opened 17 years ago
Last modified 11 years ago
#3 assigned enhancement
ListGraph should store/update the number of edges and nodes
Reported by: | Alpar Juttner | Owned by: | Peter Kovacs |
---|---|---|---|
Priority: | major | Milestone: | LEMON 1.4 release |
Component: | core | Version: | hg main |
Keywords: | ListGraph | Cc: | |
Revision id: |
Description (last modified by )
Counting the number of edges and nodes is a frequent graph operation. Now, it takes linear time to get these values. Instead we might store these values and update them at every edge/node addition and deletion.
Of course this change would slow down these operations a bit.
This ticket is a copy of report #8 of the old bug tracking system. It refers to svn trunk -r2460.
Attachments (2)
Change History (26)
comment:1 Changed 17 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 17 years ago by
Milestone: | → Post 1.0 |
---|
comment:3 follow-up: 4 Changed 16 years ago by
Owner: | changed from Alpar Juttner to Peter Kovacs |
---|---|
Revision id: | → 3fb8ed1322de |
Status: | new → assigned |
comment:4 follow-up: 5 Changed 16 years ago by
Replying to kpeter:
It is still relevant in [3fb8ed1322de]. Moreover
SmartGraph
also needs such improvement, since it doesn't have the needed tags and functions, although I think it would be easy to implement.
For SmartGraph
we definitely want O(1) node/edge counting.
For ListGraph
it is a question. It increases a bit the memory footprint (well, the increment is negligible), and makes the edge addition/deletion a bit slower. I cannot tell how big this effect would be. I think we can wait with this until we have a reliable benchmarking system.
comment:5 follow-ups: 6 10 Changed 16 years ago by
Replying to alpar:
For
SmartGraph
we definitely want O(1) node/edge counting.
I know, and I thought that it is true for both SmartGraph
and SmartDigraph
, but for the former one it isn't implemented yet.
For
ListGraph
it is a question.
I think it would be much better. Item counting operations are used really frequently (even by algorithms). And since alterations are (relatively) slow operations (modify the containers, indices, notifying all observers etc.) compared to a single assignment of type int, so I think this change would worth to make.
For example, when I implemented algorithms I queried the number of nodes and edges/arcs only once and stored it just because I would like to be efficient even for List(Di)Graph
. It is not convenient.
comment:6 follow-up: 7 Changed 16 years ago by
Replying to kpeter:
For example, when I implemented algorithms I queried the number of nodes and edges/arcs only once and stored it just because I would like to be efficient even for
List(Di)Graph
. It is not convenient.
Yes, I know this. The question is if we want to require that these queries are O(1) for _all_ graphs?
- If yes, then these query functions could/should be the members of the graph classes.
- If not, then a generic algorithm cannot depend on this property.
comment:7 follow-up: 8 Changed 16 years ago by
Replying to alpar:
Yes, I know this. The question is if we want to require that these queries are O(1) for _all_ graphs?
Definitely not. Think of the various adaptors implemented in the 0.x series (and we want to have them in the 1.x series as well). E.g. in the case of sub graph adaptors and ResGraphAdaptor
I think it could not be implemented (or would be very difficult and/or inefficient). Although these adaptors are very important.
With this observation I aswered my own question: we should not call count functions many times in a generic algorithm, but not because of List(Di)Graph
, even bacause of adaptors. :)
However I think it would be important to have O(1) count functions in all graph implementations that are not adaptors. For example, in STL all containers are expected to have an O(1) size()
function, even lists, sets etc., since it is an important and frequent operation.
- If yes, then these query functions could/should be the members of the graph classes.
Think of adaptors again.
- If not, then a generic algorithm cannot depend on this property.
Yes, it is true.
comment:8 follow-up: 9 Changed 16 years ago by
Replying to kpeter:
For example, in STL all containers are expected to have an O(1)
size()
function, even lists, sets etc., since it is an important and frequent operation.
I don't know what do you mean by "expected", but std::list::size()
are recommended but not required to be O(1) by the standard.
comment:9 Changed 16 years ago by
Replying to alpar:
I don't know what do you mean by "expected", but
std::list::size()
are recommended but not required to be O(1) by the standard.
I didn't know that, I just expected what is recommended.
However it is exactly the same concept that I would like to follow in LEMON: O(1) item counting is not required for all graph types (e.g. adaptors won't support it), but it is strongly recommended for all basic graph implementations. They are much more heavy-weight structures than e.g. a single std::list
, thus the relative overhead would be much smaller.
Changed 16 years ago by
Attachment: | smart_itemcount_a7e8ad460d66.patch added |
---|
comment:10 Changed 16 years ago by
Replying to kpeter:
I know, and I thought that it is true for both
SmartGraph
andSmartDigraph
, but for the former one it isn't implemented yet.
[a7e8ad460d66] adds missing tags and functions for item counting in SmartGraph
.
comment:11 follow-up: 13 Changed 16 years ago by
[3fe3c838a89e] adds support for constant time item counting in ListDigraph
and ListGraph
.
Changed 16 years ago by
Attachment: | 3fe3c838a89e.patch added |
---|
comment:12 Changed 16 years ago by
Version: | svn trunk → hg main |
---|
comment:13 follow-up: 15 Changed 16 years ago by
Milestone: | LEMON 1.1 release |
---|
Replying to kpeter:
[3fe3c838a89e] adds support for constant time item counting in
ListDigraph
andListGraph
.
I would like to see a detailed running time evaluation, before putting this patch into the main branch.
The question is important, but in my opinion, it is not of the highest priority. Therefore I remove it from the 1.1 milestone. (Which does not mean that this change cannot get into the 1.1 release. If a satisfactory running time analyses will have been done before the release, we will include this path into the release.)
comment:15 follow-up: 18 Changed 15 years ago by
Replying to alpar:
I would like to see a detailed running time evaluation, before putting this patch into the main branch.
What kind of "detailed running time evaluation" would you like to see?
Isn't it clear enough (with virtually no tests) that a simple ++
or --
operation for an int
variable cannot make a noticeable difference in the running time of such functions that contain 8-10 assignments and 2-3 if
s, each of which performs at least one operator[]
for a vector? And keep in mind, that if there were node/arc maps (that's the typical use case), then the relative difference would be even smaller! I think it is clear. However I made some tests that validated this assumption. Of course, many other tests could be performed, but that could be an endless story. Why don't we make tests in which countNodes()
and/or countArcs()
is also used?
If constant time item counting is expected for simple data structures like std::list
, then it seems to be more than natural to expect this for List(Di)graph
as well, since the relative overhead must be much smaller.
I suggest to focus on the huge advantage of this modification rather than this captious discussion about its negligible overhead.
comment:16 follow-up: 17 Changed 15 years ago by
Milestone: | → LEMON 1.2 release |
---|---|
Revision id: | 3fb8ed1322de |
comment:17 Changed 15 years ago by
Milestone: | LEMON 1.2 release → LEMON 1.3 release |
---|
Replying to kpeter:
comment:18 follow-up: 19 Changed 14 years ago by
I made some tests again and I cannot detect measureable overhead of this proposal (which is quite obvious indeed). Therefore, I see no reason to refuse or postpone it again. I can only repeat my previous reasoning.
comment:19 follow-up: 20 Changed 14 years ago by
Replying to kpeter:
I made some tests again and I cannot detect measureable overhead of this proposal (which is quite obvious indeed). Therefore, I see no reason to refuse or postpone it again. I can only repeat my previous reasoning.
This, and all other tickets possibly affecting the performance of the core tools of LEMON are awaiting to a satisfactory solution to #33.
comment:20 follow-up: 21 Changed 14 years ago by
Replying to alpar:
This, and all other tickets possibly affecting the performance of the core tools of LEMON are awaiting to a satisfactory solution to #33.
That's what I would like to confirm: this modification does not cause a measurable overhead. It is so obvious, but I validated it in various tests. So it is not performance issue! It is much more about natural requirements/expectations.
I see your reasons about pushing #33, but to tell the truth, I don't think that a comprehensive and satisfactory benchmark system will be realized in a reasonable time (e.g. a few years). I do find it a cumbersome policy to strictly bound this simple but important decision to such a vague task.
comment:21 follow-up: 22 Changed 14 years ago by
Replying to kpeter:
I see your reasons about pushing #33, but to tell the truth, I don't think that a comprehensive and satisfactory benchmark system will be realized in a reasonable time (e.g. a few years). I do find it a cumbersome policy to strictly bound this simple but important decision to such a vague task.
Short answer:
Then, we must wait a few years to have this change accepted.
Long answer:
Firstly, it requires no more than a week of work of a single person to develop the core of an appropriate benchmarking tool. Forecasting that it won't be done within a few years is not only stupid, but also counterproductive, demoralizing and gives an overall bad reputation to the project.
Secondly, this kind of decision should not be based on personal trust, but on well documented, reproducible and reliable tests. None of the tests I can see here in the ticket are either well documented or reproducible, therefore none of them are reliable.
I admit in case of this ticket the situation might be pretty clear, therefore the question is matter of principle in some sense. BUT the question is whether or not we want opt for high standards in our developing process. If yes, it means we must insist on them in all cases and can't make random exceptions.
comment:22 Changed 14 years ago by
Replying to alpar:
Short answer:
Then, we must wait a few years to have this change accepted.
Long answer:
Firstly, it requires no more than a week of work of a single person to develop the core of an appropriate benchmarking tool. Forecasting that it won't be done within a few years is not only stupid, but also counterproductive, demoralizing and gives an overall bad reputation to the project.
I must refuse your rude words (stupid, etc.).
- In the last 2-3 years, it was me who made tha majority of benchmark test around LEMON (see e.g. http://lime.cs.elte.hu/~kpeter/hg/hgwebdir.cgi/lemon-benchmark/) So I have some experiances about it. And I must say that it is not an easy and agreeable work. The only sure thing is that however hard you try to make correct benchmarks, they can still be criticized.
- Therefore, I find it extremely difficult to build a benchmark system you are talking about in #33, which could be accepted by the LEMON developers as comprehensive and correct in _all_ aspect to rely on about design decisions. Not to mention the actual testing on various computers using different compilers. It is much more than a week.
- We have plenty of tasks that are much more important (e.g. porting, multi-thread support, etc.).
- LEMON currently has only a few stable developers. That is why all releases were delayed. Unfortunatelly, I'm sure that I won't be able to realize #33, and I will have much less time for core development in the future. Unless you find a student directly for this task, I'm uncertain about who will do this.
Secondly, this kind of decision should not be based on personal trust, but on well documented, reproducible and reliable tests. None of the tests I can see here in the ticket are either well documented or reproducible, therefore none of them are reliable.
The tests are reproductible, they can be found in this repository:
http://lime.cs.elte.hu/~kpeter/hg/hgwebdir.cgi/lemon-benchmark/
However, I regard this ticket as a proposal of fixing a bad design decision. If it is widely expected for a simple linked list structure that it knows its length, then it is more than a natural requirement for Lis(Di)graph
. As I mentioned several times, it is NOT a perforamnce issue!
I admit in case of this ticket the situation might be pretty clear, therefore the question is matter of principle in some sense.
That's what I say...
BUT the question is whether or not we want opt for high standards in our developing process. If yes, it means we must insist on them in all cases and can't make random exceptions.
BUT we mustn't make such an inexplicably huge argument about the imaginary performance overhead of a single increment or decrement opertaion vs several vector access operations and a few if statements (not to mention the typical case of also maintaining maps) and totally forgeting the conceptual reasons and the dominant advantage and of this proposal. It's nonsense. I give it up.
comment:23 Changed 14 years ago by
I'm sorry for my vehement comments. I hope I was wrong, and a "satisfactory solution to #33" will be realized and the necessary experiments will be conducted in the near future. Especially, if they are not required to be comprehensive. I just thought that they are.
comment:24 Changed 11 years ago by
Milestone: | LEMON 1.3 release → LEMON 1.4 release |
---|
Replying to alpar:
It is still relevant in [3fb8ed1322de]. Moreover
SmartGraph
also needs such improvement, since it doesn't have the needed tags and functions, although I think it would be easy to implement.