Opened 16 years ago
Closed 16 years ago
#167 closed task (fixed)
Port DIMACS readers
Reported by: | Alpar Juttner | Owned by: | Alpar Juttner |
---|---|---|---|
Priority: | major | Milestone: | LEMON 1.1 release |
Component: | core | Version: | hg main |
Keywords: | Cc: | ||
Revision id: |
Description
This ticket is a follow up of #55.
Attachments (4)
Change History (24)
comment:1 Changed 16 years ago by
Milestone: | LEMON 1.1 release |
---|
Changed 16 years ago by
Attachment: | a6bc90e888ff.patch added |
---|
comment:2 follow-ups: 3 5 Changed 16 years ago by
Status: | new → assigned |
---|
Uploaded a basic port of the DIMACS reader plus the DIMACS->LGF converter, see [a6bc90e888ff], I found the overloaded readDimacs()
function very confusing, thus I included the problem type in the function names (readDimacsMin()
,readDimacsSp()
etc.). I'm still not very satisfied with the API. The main problem is that we cannot detect the problem type in advance, thus e.g. you must give it as a command line option to the dimacs-to-lgf
converter, though it could perfectly be autodetected. The possible solution is either
- to develop an API for detecting the type. It is not clear how to do it while keep it single pass, or
- re-implement the DIMACS parser in the converter script. Actually, it is not a big deal, the final code would probably be even shorter compared to the present one.
I'm also wondering if we really need the readDimacs*()
stuff. Isn't it enough to provide the converter script only?
comment:3 follow-up: 4 Changed 16 years ago by
Replying to alpar:
I'm also wondering if we really need the
readDimacs*()
stuff. Isn't it enough to provide the converter script only?
I like these functions, and I use them a lot. A converter script cannot be used inside another program (without copy-pasting), however there are some natural use cases when it is important. E.g.
- You would like to write a program that can directly handle DIMACS input.
- You would like to work with really large DIMACS instances (e.g. created with a graph generator). Then using the converter script as a separate program requires twice as much time and/or twice as much space than using the functions.
comment:4 follow-up: 7 Changed 16 years ago by
Replying to kpeter:
Replying to alpar:
I'm also wondering if we really need the
readDimacs*()
stuff. Isn't it enough to provide the converter script only?I like these functions, and I use them a lot. A converter script cannot be used inside another program (without copy-pasting), however there are some natural use cases when it is important. E.g.
- You would like to write a program that can directly handle DIMACS input.
- You would like to work with really large DIMACS instances (e.g. created with a graph generator). Then using the converter script as a separate program requires twice as much time and/or twice as much space than using the functions.
I'm aware that these are the (only) use-cases of the readDimacs*()
tools. I'm just a bit concerned if they are real use-cases.
- Firstly, why do you want to write a tool which reads DIMACS file instead of
.lgf
when the later one is simply more general and more robust? - Secondly, is the space issue is really a problem? Why not just compress those files, then? In fact, in most of the cases you do not even need to store the
.lgf
file, you can simply compose the tools with the pipe operator like this$ dimacs-to-lgf net.dim | ./myalg
comment:5 follow-up: 6 Changed 16 years ago by
Replying to alpar:
Uploaded a basic port of the DIMACS reader plus the DIMACS->LGF converter, see [a6bc90e888ff],
You exchanged the changesets: [c052e6718a95] is the port and [a6bc90e888ff] is the renaming. :)
I found the overloaded
readDimacs()
function very confusing, thus I included the problem type in the function names (readDimacsMin()
,readDimacsSp()
etc.).
I find it a good idea.
I'm still not very satisfied with the API. The main problem is that we cannot detect the problem type in advance,
Yes, it is true. But when you read a graph from an LGF file, than you encounter exactly the same problem: you usually have to know what maps and attributes can be found in the file. So I do not find this question really problematic.
thus e.g. you must give it as a command line option to the
dimacs-to-lgf
converter, though it could perfectly be autodetected.
However the script would be better with auto-detecting.
The possible solution is either
- to develop an API for detecting the type. It is not clear how to do it while keep it single pass, or
What about introducing a simple class interface, e.g. something like that.
class DimacsReader { public: enum DimacsType { MAX, SP, MIN, MAT, UNKNOWN } DimacsReader(istream &is = std::cin) : _is(is), _type(UNKNOWN) {} DimacsType readType() {...} void readMax(...) {...} void readSp(...) {...} void readMin(...) {...} void readMat(...) {...} private: istream &_is; DimacsType _type; int _n, _m; };
readType()
would read the file until the line describing the problem (starting with p
), set _type
, _n
, and _m
, and return _type
. After that according to this return value the appropriate readXyz()
function can be called with giving the needed maps as parameters to them. (They would check if the _type
variable meets their requirements.)
This solution would make the converter script much simpler and could be used separatley, too. Maybe we could also keep the functions but they would also use this class.
- re-implement the DIMACS parser in the converter script. Actually, it is not a big deal, the final code would probably be even shorter compared to the present one.
It wouldn't be difficult to do, but the script wouldn't be so nice.
comment:6 Changed 16 years ago by
Replying to kpeter:
You exchanged the changesets: [c052e6718a95] is the port and [a6bc90e888ff] is the renaming. :)
Well, this is just an old trick to check if someone indeed have a look at these patches.
What about introducing a simple class interface, e.g. something like that.
I'm thinking on a simpler solution (in term of the required refactoring), a patch is coming soon.
comment:7 follow-up: 8 Changed 16 years ago by
Replying to alpar:
I'm aware that these are the (only) use-cases of the
readDimacs*()
tools. I'm just a bit concerned if they are real use-cases.
- Firstly, why do you want to write a tool which reads DIMACS file instead of
.lgf
when the later one is simply more general and more robust?
Because (unfortunatelly) DIMACS format is more widely known and used than LGF. It is supported by various programs, but LGF is used only in connections with LEMON.
- Secondly, is the space issue is really a problem? Why not just compress those files, then?
E.g. I used DIMACS files up to about 1 GB, which is about 300 MB in .gr.gz format, so space can be an issue, too.
In fact, in most of the cases you do not even need to store the
.lgf
file, you can simply compose the tools with the pipe operator like this
That's why I wrote: "twice as much time", because the graph have to be read and built twice in this case. (It could be a few minutes in case of really large instances.)
comment:8 Changed 16 years ago by
Replying to kpeter:
E.g. I used DIMACS files up to about 1 GB, [...] (It could be a few minutes in case of really large instances.)
How long would it exactly take to convert this file to .lgf, not considering the time of writing the output back to the disk? I mean something like this.
time dimacs-to-lgf -dunnothetype <bigfile.dim >/dev/null
or
time (dimacs-to-lgf -dunnothetype <bigfile.dim|cat >/dev/null)
(do these tests twice to make sure, that bigfile.dim is in the cache)
comment:9 follow-up: 10 Changed 16 years ago by
Here are the results for a DIMACS min. cost flow input file of 280 MB (n = 100,000; m = 11,000,000):
kpeter@lime:~> time dimacs-to-lgf -mcf <bigfile.dim >/dev/null real 1m43.353s user 1m38.174s sys 0m5.200s kpeter@lime:~> time dimacs-to-lgf -mcf <bigfile.dim >/dev/null real 1m42.408s user 1m38.134s sys 0m4.284s kpeter@lime:~> time (dimacs-to-lgf -mcf <bigfile.dim|cat >/dev/null) real 2m24.584s user 1m45.807s sys 1m19.565s kpeter@lime:~> time (dimacs-to-lgf -mcf <bigfile.dim|cat >/dev/null) real 2m24.855s user 1m48.103s sys 1m15.125s
Note that there much larger DIMACS files, too. E.g. the USA road graph used in the 9th DIMACS implemenation challange is about 1 GB.
comment:10 Changed 16 years ago by
Replying to kpeter:
Note that there much larger DIMACS files, too. E.g. the USA road graph used in the 9th DIMACS implemenation challange is about 1 GB.
You can test it, if you would like to. You can find it in /mnt/sdb/testdata/www.dis.uniroma1.it/challenge9/data/USA-road-d/USA-road-d.USA.gr.gz
on lime.cs.elte.hu
.
Changed 16 years ago by
Attachment: | dimacs-50d96f2166d7-9d1faab5e0f1-24a2c6ee6cb0.bundle added |
---|
Port + refactoring of the DIMACS stuff
comment:11 follow-ups: 12 17 Changed 16 years ago by
Milestone: | → LEMON 1.0 release |
---|
Please have a look and [attachmend:dimacs-50d96f2166d7-9d1faab5e0f1-24a2c6ee6cb0.bundle the attached bundle file]. Changesets [50d96f2166d7] and [9d1faab5e0f1] ports the tools and does the basic renaming, while [24a2c6ee6cb0] is a major refactoring of both the API and the command line tool.
comment:12 follow-ups: 13 14 Changed 16 years ago by
Replying to alpar:
Please have a look and the attached bundle file.
It is great! This soulton is better than the one I suggested.
A question: why do you use the lineShift
variable? It is set by dimacsType()
, but it is not used anywhere. Is it important?
I also made some doc improvements. See [0a3ec097a76c].
comment:13 follow-up: 15 Changed 16 years ago by
Replying to kpeter:
Replying to alpar:
Please have a look and the attached bundle file.
It is great! This soulton is better than the one I suggested.
A question: why do you use the
lineShift
variable? It is set bydimacsType()
, but it is not used anywhere. Is it important?
Not important. I could be used for providing line number info for the FormatError
exceptions, if someone took the hassle to implement is.
Btw, the error handling is far from being rigorous currently. I wasn't very motivated to make it fool-proof, as DIMACS is not our primary graph descriptor format.
comment:14 follow-up: 16 Changed 16 years ago by
Replying to kpeter:
I also made some doc improvements. See [0a3ec097a76c].
When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.
comment:15 Changed 16 years ago by
Replying to alpar:
Not important. I could be used for providing line number info for the
FormatError
exceptions, if someone took the hassle to implement is.
Okay. It can be kept or it can be removed, as you like it.
Btw, the error handling is far from being rigorous currently. I wasn't very motivated to make it fool-proof, as DIMACS is not our primary graph descriptor format.
I agree. It is not so important.
comment:16 follow-up: 19 Changed 16 years ago by
Replying to alpar:
When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.
I prefer the groups.dox, since I like if they are kept together, e.g. it is much easier to look over or reorganize them. Moreover, which is much more important: placing the docs into separate files you do not have the ability to specify the order of the groups or subgroups. E.g. in [24a2c6ee6cb0] the DIMACS group follows the Nauty group, but in [0a3ec097a76c] DIMACS is the first.
comment:17 follow-up: 18 Changed 16 years ago by
Replying to alpar:
Btw. why do you move this ticket to the closed 1.0 milestone?
comment:18 Changed 16 years ago by
Milestone: | LEMON 1.0 release → LEMON 1.1 release |
---|
comment:19 Changed 16 years ago by
Replying to kpeter:
Replying to alpar:
When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.
I prefer the groups.dox, since I like if they are kept together, e.g. it is much easier to look over or reorganize them.
Why? We are talking about the leaf of the module tree, a leaf containing a single file.
Moreover, which is much more important: placing the docs into separate files you do not have the ability to specify the order of the groups or subgroups. E.g. in [24a2c6ee6cb0] the DIMACS group follows the Nauty group, but in [0a3ec097a76c] DIMACS is the first.
It is very important, indeed. :)
comment:20 Changed 16 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
The port went to the main branch.
Port of DIMACS stuff