https://gitlab.inria.fr/dcoudert/hyperbolicity/
Tip revision: 4f9c8ce6eed890ed5bad1f56ff550577b4819bad authored by COUDERT David on 21 November 2021, 13:54:28 UTC
Add new file
Add new file
Tip revision: 4f9c8ce
README.md
# Hyperbolicity
Implementation in C++ of algorithms for computing the hyperbolicity of graphs.
## Authors
- [David Coudert](http://www-sop.inria.fr/members/David.Coudert/)
- [André Nusser](https://people.mpi-inf.mpg.de/~anusser/)
- [Laurent Viennot](https://who.rocq.inria.fr/Laurent.Viennot/)
## License
This code is released under the GNU General Public License, version 3, or any later version as published by the Free Software Foundation.
## How to cite
Refer to the code using:
- David Coudert, André Nusser, Laurent Viennot. **Hyperbolicity** (version 2.0). [https://gitlab.inria.fr/dcoudert/hyperbolicity/](https://gitlab.inria.fr/dcoudert/hyperbolicity/), 2021.
Articles describing the algorithms:
- David Coudert, André Nusser, Laurent Viennot. **[Enumeration of far-apart pairs by decreasing distance for faster hyperbolicity computation](https://hal.inria.fr/hal-03201405)** [Research Report] Inria; I3S, Université Côte d'Azur. 2021. [pdf](https://hal.inria.fr/hal-03201405/file/main-hal.pdf) [bibtex](https://hal.inria.fr/hal-03201405v1/bibtex)
- David Coudert, André Nusser, Laurent Viennot. **[Computing Graph Hyperbolicity Using Dominating Sets](https://hal.inria.fr/hal-03431155)**. Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, 2022.
## Installation and usage
This code can be used from the command line or inside your own C++ program. Since version 2.0, this code uses 2 submodules ([hub-labeling](https://github.com/lviennot/hub-labeling) and [contraction-hierarchies](https://github.com/lviennot/contraction-hierarchies.git)).
After cloning the code, type the following commands in a console
```shell
.$ cd hyperbolicity
.$ git submodule init
.$ git submodule update
.$ build.sh
```
The main executable will then be in subdirectory ``` build/```.
### Getting help
```shell
.$ build/main -h
The <algorithm_version> should be the number:
1) for Borassi et al.
2) for the algorithm by Coudert, Nusser and Viennot, based on the enumeration of far-apart pairs. This version is without pruned BFS, initial heuristic and initial lower bound.
3) for the algorithm by Coudert, Nusset and Viennot, based on the enumeration of far-apart pairs and using with pruned BFS and initial heuristic (default).
Parameters: [-c <cache_size>] [-noprune] [-noheur] [-l <lowerbound>] [-k <number_of_trials>]
-c <cache_size>: allow to specify the size of the cache of BFSs (at least 2)
-noprune : disable the use of pruned BFS in algorithm 3
-noheur : disable the heuristic in algorithm 3
The combination '-a 3 -noprune -noheur' is equivalant to '-a 2'
-l <lowerbound> : allow to specify an initial lower bound
-k <number_of_trials> : specifies the number of trials of the heuristic (default: 10).
Using '-k 0' is equivalant to '-noheur'
-b <ball> : a parameter for the heuristic
5) for a heuristic -- requires parameters k (10 by default) and b (1 by default)
7) count number of far-apart pairs and output distribution according to distance
20) exact hyperbolicity computation inspired from algorithm 1 : compute dominating sets for distances d, d/r, d/r^2, ..., 1 and check necessary quadruples in a depth first search manner. Distance d can be set with -domd d (computed depending on a lower-bound of hyperbolicity by default). Ratio r can be set with -domr r (3.0 by default). Dominating set heuristic can be set using -domv v (52 by default). Distances are computed using the pruned landmark labeling scheme of Akiba et al. The maximum side of cached distance matrices can be set using -doms s. The maximum number of cached matrices can be set using -domc c. Use -l h_lb if h_lb is a known lower-bound of hyperbolicity.
21) Same as 20 but using the implementation of hub labeling by L. Viennot.
Parameters for algorithms 20 and 21: [-domd <distance>] [-domr <ratio>] [-domv <version>] [-doms <max_matrix_side>] [-domc <max_number_of_cached_matrices>] [-l <lowerbound>]
-domd <distance>: domination distance (>= 0).
-domr <ratio>: reduction ratio for the domination distances of the hierarchical dominating set (must be > 1).
-domv <version>: version of the greedy dominating set algorithm to use (52 by default). This is a number xy such that:
The lowest digit (y) indicates the order in which to consider vertices:
0: no specific order, so vertex labels
1: by increasing degree
2: by non-increasing degree
3: by increasing eccentricity
4: by non-increasing eccentricity
5: random order
The highest digit (x) indicates:
0: basic greedy dominating set; use recursion to build sub-dominating sets.
1: basic greedy dominating set; use iterative method to build sub-dominating sets.
2: connected dominating set; use iterative method to build sub-dominating sets.
3: same as 1
4: connected dominating set; each vertex attached to closest dominator; use iterative method to build sub-dominating sets.
5: basic greedy dominating set; each vertex attached to closest dominator; use iterative method to build sub-dominating sets.
-doms <max_matrix_side>: maximum side of stored distance matrices.
-domc <max_number_of_cached_matrices>]: maximum number of cached distance matrices (at least 7).
30) Build hierarchical dominating set and exit.
Parameters: [-domd <distance>] [-domr <ratio>] [-domv <version>]
```
### Examples
Compute the hyperbolity of the `CAIDA_as_20040105` graph using different algorithms.
The default algorithm is the algorithm proposed in [2].
```shell
.$ ./build/main graphs/CAIDA_as_20040105.bcc.edgelist -a 3
Graph: #nodes: 10424, #edges: 27061
min/max/avg degree: 2 / 1952 / 5.19206
BFS cache capacity: 1000
BFSInfo type: 0
Algorithm version: 3
radius = 4
mean eccentricity = 5.97842
diameter = 8
h_lb = 1 (3110, 4580, 1, 23)
h_lb = 2 (3110, 4580, 8, 3289)
h_lb = 3 (3110, 4580, 337, 2841)
h_lb = 3 h_ub = 8
best_h_lb_found = 1 h_lb = 3 h_ub = 8 for 3110 3318 257 7393
best_h_lb_found = 2 h_lb = 3 h_ub = 8 for 3602 7427 9357 257
best_h_lb_found = 3 h_lb = 3 h_ub = 8 for 4580 7352 8474 7379
h_lb = 3 h_ub = 7
best_h_lb_found = 4 h_lb = 4 h_ub = 7 for 1595 7393 5598 7086
h_lb = 4 h_ub = 6
best_h_lb_found = 5 h_lb = 5 h_ub = 6 for 1736 9310 9138 396
FarApartIterator (algo = DHV): #BFS initial = 1125 final = 9856
Time for computation of far apart pairs in computeBorassi: NO DATA
Time for computation of hyperbolicity in computeBorassi: NO DATA
Time for computation of far apart pairs in computeBorassiHubLabeling: NO DATA
Time for computation of hyperbolicity in computeBorassiHubLabeling: NO DATA
Initialization of v2 algorithm: sum = 1249 ms, mean = 1249 ms
HasNext call of v2 loop to get next far pair: sum = 3533.4 ms, mean = 0.013952 ms
Initial part of v2 main loop: sum = 10174 ms, mean = 0.040174 ms
Time for acceptable and valuable computation in v2 main loop: sum = 3534.7 ms, mean = 0.013957 ms
Lower bound update of v2 main loop: sum = 7735 ms, mean = 0.030542 ms
COMPLETE TIME FOR BORASSI_HUBLAB: : NO DATA
COMPLETE TIME FOR BORASSI: : NO DATA
COMPLETE TIME FOR OUR APPROACH: : sum = 26292 ms, mean = 26292 ms
Counts BFS calls in compute_v2: 71609
The hyperbolicity of the graph is 2.5
```
We can compare with the algorithm proposed in [1]
```shell
.$ ./build/main graphs/CAIDA_as_20040105.bcc.edgelist -a 1
Graph: #nodes: 10424, #edges: 27061
min/max/avg degree: 2 / 1952 / 5.19206
BFS cache capacity: 1000
BFSInfo type: 0
Algorithm version: 1
radius = 4
mean eccentricity = 5.97842
diameter = 8
h_lb = 0 h_ub = 8
h_lb = 1 h_ub = 8
h_lb = 2 h_ub = 8
h_lb = 3 h_ub = 8
h_lb = 3 h_ub = 7
h_lb = 4 h_ub = 7
h_lb = 4 h_ub = 6
h_lb = 5 h_ub = 6
h_lb = 5 h_ub = 5
Time for computation of far apart pairs in computeBorassi: sum = 9586.5 ms, mean = 9586.5 ms
Time for computation of hyperbolicity in computeBorassi: sum = 9202.6 ms, mean = 9202.6 ms
Time for computation of far apart pairs in computeBorassiHubLabeling: NO DATA
Time for computation of hyperbolicity in computeBorassiHubLabeling: NO DATA
Initialization of v2 algorithm: NO DATA
HasNext call of v2 loop to get next far pair: NO DATA
Initial part of v2 main loop: NO DATA
Time for acceptable and valuable computation in v2 main loop: NO DATA
Lower bound update of v2 main loop: NO DATA
COMPLETE TIME FOR BORASSI_HUBLAB: : NO DATA
COMPLETE TIME FOR BORASSI: : sum = 18789 ms, mean = 18789 ms
COMPLETE TIME FOR OUR APPROACH: : NO DATA
Counts BFS calls in compute_v2: 0
The hyperbolicity of the graph is 2.5
```
## Input graphs
In directory `graphs/` you can find the graphs used to conduct experiments in [2,3].
Each graph is the largest biconnected component of the original graph, with vertices relabeled in the range `0..n-1`.
These graphs are from the BioGRID interaction database (BG-\*) [4];
a protein interactions network (dip20170205) [5];
graphs of the autonomous systems from the Internet (CAIDA_as_\* and DIMES_\*) [6,7];
We also use social networks (Epinions, Hollywood, Slashdot, Twitter), co-authors graphs (ca-\*, dblp), computer networks (Gnutella, Skitter), web graphs (NotreDame), road networks (oregon2, FLA-t), a 3D triangular mesh (buddha), and grid like graphs from VLSI applications (alue7065) and from computer games (FrozenSea). The original data is available from [snap.stanford.edu](https://snap.stanford.edu), [webgraph.di.unimi.it](https://webgraph.di.unimi.it), [www.dis.uniroma1.it/challenge9](https://www.dis.uniroma1.it/challenge9), [graphics.stanford.edu](graphics.stanford.edu), [steinlib.zib.de](https://steinlib.zib.de), and [movingai.com](https://movingai.com). Furthermore, we test synthetic inputs: grid300-10, grid500-10 and grid1500-10 are square grids with respective sizes 301x301, and 501x501 where 10% of the edges have been randomly deleted.
### Graph file format
We use a simple edge list file format to store undirected unweighted graphs. Roughly, the
* `# This is a comment` -- Comment line. Can appear anywhere in the file. The symbols `c` and `p` can also be used instead of `#`.
* `u v` -- edge between vertex *u* and vertex *v*. Vertex labels *u* and *v* must be positive integers, and we assume that the number of vertices is the largest label plus one. That is, if *l* is the largest label, we assume that graph has the `l-1` vertices `0, 1, ..., l-1`.
```
# Graph name: Largest biconnected component of p2p-Gnutella09.edgelist
# undirected
# N = 5606
# M = 23510
#
# Vertices have been relabeled in the range 0..N-1.
#
0 5308
0 4751
0 1
0 505
0 2
0 3
0 4
0 5
0 6
0 845
0 7
0 8
0 2799
0 189
0 3440
0 3908
1 353
2 366
3 1103
3 1662
...
```
## References
1. M. Borassi, D. Coudert, P. Crescenzi and A. Marino. **[On Computing the Hyperbolicity of Real-World Graphs](https://hal.inria.fr/hal-01199860)**. 23rd Annual European Symposium on Algorithms (ESA), Sep 2015, Patras, Greece. pp.215-226, [10.1007/978-3-662-48350-3_19](http://dx.doi.org/10.1007/978-3-662-48350-3_19⟩.
2. D. Coudert, A. Nusser and L. Viennot. **[Enumeration of far-apart pairs by decreasing distance for faster hyperbolicity computation](https://hal.inria.fr/hal-03201405)**. Research Report Inria; I3S, Université Côte d'Azur, 2021. [hal-03201405]⟨https://hal.inria.fr/hal-03201405⟩
3. D. Coudert, A. Nusser, L. Viennot. **[Computing Graph Hyperbolicity Using Dominating Sets]()**. Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, 2022.
4. R. Oughtred, C. Stark, B-J. Breitkreutz, J. Rust, L. Boucher, C. Chang, N. Kolas, L. O’Donnell, G. Leung, R. McAdam, Rochelle and others. **The BioGRID interaction database: 2019 update**. Nucleic acids research, 47(D1):D529-D541, 2019.
5. L. Salwinski, C. Miller, A. Smith, F. Pettit, J. Bowie and D. Eisenberg. **The database of interacting proteins: 2004 update**. Nucleic acids research 32(suppl_1):D449-D451, 2004.
6. The Cooperative Association for Internet Data Analysis (CAIDA). **The CAIDA AS Relationships Dataset**. 2013.
7. Y. Shavitt and E. Shir. **DIMES: Let the Internet Measure Itself**. ACM SIGCOMM Computer Communication Review, 35(5):71-74, 2005.