Raw File
Tip revision: 873630a5943f3eba8f4bbc9751fdf16af98d2171 authored by lykeven on 26 May 2020, 12:18:34 UTC
Tip revision: 873630a
# ProNE

### [Paper](

ProNE: Fast and Scalable Network Representation Learning

Jie Zhang, [Yuxiao Dong](, Yan Wang, [Jie Tang]( and Ming Ding

Accepted to IJCAI 2019 Research Track!

## Prerequisites

- Linux or macOS
- Python 2 or 3
- scipy
- sklearn

## Installation

Clone this repo.

git clone
cd ProNE

Please install dependencies by

pip install -r requirements.txt

## Dataset

These datasets are public datasets.

- PPI contains 3,890 nodes, 76,584 edges and 60 labels.
- Wikipedia contains 4,777 nodes, 184,812 edges and 40 labels.
- Blogcatalog contains 10,312 nodes, 333,983 edges and 39 labels.
- DBLP contains 51,264 nodes, 127,968 edges and 60 labels. 
- Youtube contains 1,138,499 nodes, 2,990,443 edges and 47 labels.

## Training

### Training on the existing datasets

Create emb directory to save output embedding file
mkdir emb
You can use `python -graph example_graph` to train ProNE model on the example data.

If you want to train on the PPI dataset, you can run 

python -graph data/PPI.ungraph -emb1 emb/PPI_sparse.emb -emb2 emb/PPI_spectral.emb
 -dimension 128 -step 10 -theta 0.5 -mu 0.2
Where PPI_sparse.emb and PPI_spectral.emb are output embedding files and dimension=128, step=10, theta=0.5 and mu=0.2 are the default setting for a good result. Better results would be achieved when searching mu over values around 0, for example, the results when mu = -4.0 (low pass) on Wikipedia in the enhancement experiments are better than those reported in the paper.
If you want to evaluate the embedding via node classification task, you can run

python -label data/PPI.cmty -emb emb/PPI_spectral.emb -shuffle 4
Where PPI.cmty are node label file and shuffle is the number of shuffle times for classification.

### Training on your own datasets

If you want to train ProNE on your own dataset, you should prepare the following files:
- edgelist.txt: Each line represents an edge, which contains two tokens `<node1> <node2>` where each token is a number starting from 0.

### Training on c++ version ProNE
ProNE is mainly single-thread(except for the svd on small matrices). We also provide a c++ multi-thread program ProNE.cpp for large-scale network based on
 [Eigen](, [MKL](, [FrPCA]( and [boost]( [Openmp](, and [ICC]( are used to speed up. Besides, [gflags]( is required to parse command parameter.

Compared with the orginal python version ProNE in the paper, C++ ProNE under all optimization is about 6 times faster (two minutes)  on youtube without the loss of acurracy performance.

Compile it via
icc ProNE.cpp -O3 -mkl -qopenmp -l gflags frpca/frpca.c frpca/matrix_vector_functions_intel_mkl_ext.c frpca/matrix_vector_functions_intel_mkl.c  -o ProNE.out

If you want to train on the PPI dataset, you can run
./ProNE.out -filename data/PPI.ungraph -emb1 emb/PPI.emb1 -emb2 emb/PPI.emb2
 -num_node 3890 -num_step 10 -num_thread 20 -num_rank 128 -theta 0.5 -mu 0.2

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

## Citing

If you find *ProNE* is useful for your research, please consider citing our paper:

  title     = {ProNE: Fast and Scalable Network Representation Learning},
  author    = {Zhang, Jie and Dong, Yuxiao and Wang, Yan and Tang, Jie and Ding, Ming},
  booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on
               Artificial Intelligence, {IJCAI-19}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  pages     = {4278--4284},
  year      = {2019},
  month     = {7},
  doi       = {10.24963/ijcai.2019/594},
  url       = {},
back to top