Content - dcbd8d8cc28305cd8b54ee2ba4ee430ddf21d4d7 - 719d653/README.md

visit type:
Tip revision: 4f9c8ce6eed890ed5bad1f56ff550577b4819bad authored by COUDERT David on 21 November 2021, 13:54:28 UTC
Add new file
Tip revision: 4f9c8ce
README.md
# Hyperbolicity

Implementation in C++ of algorithms for computing the hyperbolicity of graphs.


## Authors
- [David Coudert](http://www-sop.inria.fr/members/David.Coudert/)
- [André Nusser](https://people.mpi-inf.mpg.de/~anusser/)
- [Laurent Viennot](https://who.rocq.inria.fr/Laurent.Viennot/)

## License
This code is released under the GNU General Public License, version 3, or any later version as published by the Free Software Foundation.

## How to cite
Refer to the code using:
- David Coudert, André Nusser, Laurent Viennot. **Hyperbolicity** (version 2.0). [https://gitlab.inria.fr/dcoudert/hyperbolicity/](https://gitlab.inria.fr/dcoudert/hyperbolicity/), 2021.

Articles describing the algorithms:
- David Coudert, André Nusser, Laurent Viennot. **[Enumeration of far-apart pairs by decreasing distance for faster hyperbolicity computation](https://hal.inria.fr/hal-03201405)** [Research Report] Inria; I3S, Université Côte d'Azur. 2021. [pdf](https://hal.inria.fr/hal-03201405/file/main-hal.pdf) [bibtex](https://hal.inria.fr/hal-03201405v1/bibtex)
- David Coudert, André Nusser, Laurent Viennot. **[Computing Graph Hyperbolicity Using Dominating Sets](https://hal.inria.fr/hal-03431155)**. Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, 2022.


## Installation and usage
This code can be used from the command line or inside your own C++ program. Since version 2.0, this code uses 2 submodules ([hub-labeling](https://github.com/lviennot/hub-labeling) and [contraction-hierarchies](https://github.com/lviennot/contraction-hierarchies.git)).

After cloning the code, type the following commands in a console
```shell
.$ cd hyperbolicity
.$ git submodule init
.$ git submodule update
.$ build.sh
```
The main executable will then be in subdirectory ``` build/```.

### Getting help
```shell
.$ build/main -h
The <algorithm_version> should be the number:
1) for Borassi et al.

2) for the algorithm by Coudert, Nusser and Viennot, based on the enumeration of far-apart pairs. This version is without pruned BFS, initial heuristic and initial lower bound.
3) for the algorithm by Coudert, Nusset and Viennot, based on the enumeration of far-apart pairs and using with pruned BFS and initial heuristic (default).

   Parameters: [-c <cache_size>] [-noprune] [-noheur] [-l <lowerbound>] [-k <number_of_trials>]
   -c <cache_size>: allow to specify the size of the cache of BFSs (at least 2)
   -noprune : disable the use of pruned BFS in algorithm 3
   -noheur : disable the heuristic in algorithm 3
      The combination '-a 3 -noprune -noheur' is equivalant to '-a 2'
   -l <lowerbound> : allow to specify an initial lower bound
   -k <number_of_trials> : specifies the number of trials of the heuristic (default: 10).
                          Using '-k 0' is equivalant to '-noheur'
   -b <ball> : a parameter for the heuristic

5) for a heuristic -- requires parameters k (10 by default) and b (1 by default)

7) count number of far-apart pairs and output distribution according to distance

20) exact hyperbolicity computation inspired from algorithm 1 : compute dominating sets for distances d, d/r, d/r^2, ..., 1 and check necessary quadruples in a depth first search manner. Distance d can be set with -domd d (computed depending on a lower-bound of hyperbolicity by default). Ratio r can be set with -domr r (3.0 by default). Dominating set heuristic can be set using -domv v (52 by default). Distances are computed using the pruned landmark labeling scheme of Akiba et al. The maximum side of cached distance matrices can be set using -doms s. The maximum number of cached matrices can be set using -domc c. Use -l h_lb if h_lb is a known lower-bound of hyperbolicity.
21) Same as 20 but using the implementation of hub labeling by L. Viennot.

   Parameters for algorithms 20 and 21: [-domd <distance>] [-domr <ratio>] [-domv <version>] [-doms <max_matrix_side>] [-domc <max_number_of_cached_matrices>] [-l <lowerbound>]
   -domd <distance>: domination distance (>= 0).
   -domr <ratio>: reduction ratio for the domination distances of the hierarchical dominating set (must be > 1).
   -domv <version>: version of the greedy dominating set algorithm to use (52 by default). This is a number xy such that:
       The lowest digit (y) indicates the order in which to consider vertices:
           0: no specific order, so vertex labels
           1: by increasing degree
           2: by non-increasing degree
           3: by increasing eccentricity
           4: by non-increasing eccentricity
           5: random order

       The highest digit (x) indicates:
           0: basic greedy dominating set; use recursion to build sub-dominating sets.
           1: basic greedy dominating set; use iterative method to build sub-dominating sets.
           2: connected dominating set; use iterative method to build sub-dominating sets.
           3: same as 1
           4: connected dominating set; each vertex attached to closest dominator; use iterative method to build sub-dominating sets.
           5: basic greedy dominating set; each vertex attached to closest dominator; use iterative method to build sub-dominating sets.

   -doms <max_matrix_side>: maximum side of stored distance matrices.
   -domc <max_number_of_cached_matrices>]: maximum number of cached distance matrices (at least 7).

30) Build hierarchical dominating set and exit.

   Parameters: [-domd <distance>] [-domr <ratio>] [-domv <version>]

```

### Examples
Compute the hyperbolity of the `CAIDA_as_20040105` graph using different algorithms.

The default algorithm is the algorithm proposed in [2].
```shell
.$ ./build/main graphs/CAIDA_as_20040105.bcc.edgelist -a 3
Graph: #nodes: 10424, #edges: 27061
min/max/avg degree: 2 / 1952 / 5.19206
BFS cache capacity: 1000
BFSInfo type: 0
Algorithm version: 3
radius = 4
mean eccentricity = 5.97842
diameter = 8
h_lb = 1 (3110, 4580, 1, 23)
h_lb = 2 (3110, 4580, 8, 3289)
h_lb = 3 (3110, 4580, 337, 2841)
h_lb = 3        h_ub = 8
best_h_lb_found = 1 h_lb = 3    h_ub = 8 for 3110 3318 257 7393
best_h_lb_found = 2 h_lb = 3    h_ub = 8 for 3602 7427 9357 257
best_h_lb_found = 3 h_lb = 3    h_ub = 8 for 4580 7352 8474 7379
h_lb = 3        h_ub = 7
best_h_lb_found = 4 h_lb = 4    h_ub = 7 for 1595 7393 5598 7086
h_lb = 4        h_ub = 6
best_h_lb_found = 5 h_lb = 5    h_ub = 6 for 1736 9310 9138 396
FarApartIterator (algo = DHV): #BFS initial = 1125      final = 9856

Time for computation of far apart pairs in computeBorassi: NO DATA
Time for computation of hyperbolicity in computeBorassi: NO DATA
Time for computation of far apart pairs in computeBorassiHubLabeling: NO DATA
Time for computation of hyperbolicity in computeBorassiHubLabeling: NO DATA
Initialization of v2 algorithm: sum = 1249 ms, mean = 1249 ms
HasNext call of v2 loop to get next far pair: sum = 3533.4 ms, mean = 0.013952 ms
Initial part of v2 main loop: sum = 10174 ms, mean = 0.040174 ms
Time for acceptable and valuable computation in v2 main loop: sum = 3534.7 ms, mean = 0.013957 ms
Lower bound update of v2 main loop: sum = 7735 ms, mean = 0.030542 ms
COMPLETE TIME FOR BORASSI_HUBLAB: : NO DATA
COMPLETE TIME FOR BORASSI: : NO DATA
COMPLETE TIME FOR OUR APPROACH: : sum = 26292 ms, mean = 26292 ms
Counts BFS calls in compute_v2: 71609

The hyperbolicity of the graph is 2.5
```

We can compare with the algorithm proposed in [1]
```shell
.$ ./build/main graphs/CAIDA_as_20040105.bcc.edgelist -a 1
Graph: #nodes: 10424, #edges: 27061
min/max/avg degree: 2 / 1952 / 5.19206
BFS cache capacity: 1000
BFSInfo type: 0
Algorithm version: 1
radius = 4
mean eccentricity = 5.97842
diameter = 8
h_lb = 0        h_ub = 8
h_lb = 1        h_ub = 8
h_lb = 2        h_ub = 8
h_lb = 3        h_ub = 8
h_lb = 3        h_ub = 7
h_lb = 4        h_ub = 7
h_lb = 4        h_ub = 6
h_lb = 5        h_ub = 6
h_lb = 5        h_ub = 5
Time for computation of far apart pairs in computeBorassi: sum = 9586.5 ms, mean = 9586.5 ms
Time for computation of hyperbolicity in computeBorassi: sum = 9202.6 ms, mean = 9202.6 ms
Time for computation of far apart pairs in computeBorassiHubLabeling: NO DATA
Time for computation of hyperbolicity in computeBorassiHubLabeling: NO DATA
Initialization of v2 algorithm: NO DATA
HasNext call of v2 loop to get next far pair: NO DATA
Initial part of v2 main loop: NO DATA
Time for acceptable and valuable computation in v2 main loop: NO DATA
Lower bound update of v2 main loop: NO DATA
COMPLETE TIME FOR BORASSI_HUBLAB: : NO DATA
COMPLETE TIME FOR BORASSI: : sum = 18789 ms, mean = 18789 ms
COMPLETE TIME FOR OUR APPROACH: : NO DATA
Counts BFS calls in compute_v2: 0

The hyperbolicity of the graph is 2.5
```

## Input graphs
In directory `graphs/` you can find the graphs used to conduct experiments in [2,3]. 
Each graph is the largest biconnected component of the original graph, with vertices relabeled in the range `0..n-1`. 
These graphs are from the BioGRID interaction database (BG-\*) [4];
a protein interactions network (dip20170205) [5]; 
graphs of the autonomous systems from the Internet (CAIDA_as_\* and DIMES_\*) [6,7];

We also use social networks (Epinions, Hollywood, Slashdot, Twitter), co-authors graphs (ca-\*, dblp), computer networks (Gnutella, Skitter),  web graphs (NotreDame), road networks (oregon2, FLA-t),  a 3D triangular mesh (buddha), and grid like graphs from VLSI applications (alue7065) and from computer games (FrozenSea). The original data is available from [snap.stanford.edu](https://snap.stanford.edu), [webgraph.di.unimi.it](https://webgraph.di.unimi.it), [www.dis.uniroma1.it/challenge9](https://www.dis.uniroma1.it/challenge9), [graphics.stanford.edu](graphics.stanford.edu), [steinlib.zib.de](https://steinlib.zib.de), and [movingai.com](https://movingai.com). Furthermore, we test synthetic inputs: grid300-10, grid500-10 and grid1500-10 are square grids with respective sizes 301x301, and 501x501 where 10% of the edges have been randomly deleted.


### Graph file format
We use a simple edge list file format to store undirected unweighted graphs. Roughly, the 
* `# This is a comment` -- Comment line. Can appear anywhere in the file. The symbols `c` and `p` can also be used instead of `#`.
* `u v` -- edge between vertex *u* and vertex *v*. Vertex labels *u* and *v* must be positive integers, and we assume that the number of vertices is the largest label plus one. That is, if *l* is the largest label, we assume that graph has the `l-1` vertices `0, 1, ..., l-1`.

```
# Graph name: Largest biconnected component of p2p-Gnutella09.edgelist
# undirected
# N = 5606
# M = 23510
#
# Vertices have been relabeled in the range 0..N-1.
#
0 5308
0 4751
0 1
0 505
0 2
0 3
0 4
0 5
0 6
0 845
0 7
0 8
0 2799
0 189
0 3440
0 3908
1 353
2 366
3 1103
3 1662
...
```


## References
1. M. Borassi, D. Coudert, P. Crescenzi and A. Marino. **[On Computing the Hyperbolicity of Real-World Graphs](https://hal.inria.fr/hal-01199860)**. 23rd Annual European Symposium on Algorithms (ESA), Sep 2015, Patras, Greece. pp.215-226, [10.1007/978-3-662-48350-3_19](http://dx.doi.org/10.1007/978-3-662-48350-3_19⟩.
2. D. Coudert, A. Nusser and L. Viennot. **[Enumeration of far-apart pairs by decreasing distance for faster hyperbolicity computation](https://hal.inria.fr/hal-03201405)**.  Research Report Inria; I3S, Université Côte d'Azur, 2021. [hal-03201405]⟨https://hal.inria.fr/hal-03201405⟩
3. D. Coudert, A. Nusser, L. Viennot. **[Computing Graph Hyperbolicity Using Dominating Sets]()**. Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, 2022.
4. R. Oughtred, C. Stark, B-J. Breitkreutz, J. Rust, L. Boucher, C. Chang, N. Kolas, L. O’Donnell, G. Leung, R. McAdam, Rochelle and others. **The BioGRID interaction database: 2019 update**. Nucleic acids research, 47(D1):D529-D541, 2019.
5. L. Salwinski, C. Miller, A. Smith, F. Pettit, J. Bowie and D. Eisenberg. **The database of interacting proteins: 2004 update**. Nucleic acids research 32(suppl_1):D449-D451, 2004.
6. The Cooperative Association for Internet Data Analysis (CAIDA). **The CAIDA AS Relationships Dataset**. 2013.
7. Y. Shavitt and E. Shir. **DIMES: Let the Internet Measure Itself**. ACM SIGCOMM Computer Communication Review, 35(5):71-74, 2005.