https://github.com/moonfolk/MiFM
Raw File
Tip revision: e75c495cc6a57e69666d24a7afc2b4c0a7734b7d authored by Mikhail on 28 November 2017, 20:54:13 UTC
poster on website
Tip revision: e75c495
README.md
# Multi-way Interacting Regression via Factorization Machines

This is a Python 2 implementation of MiFM algorithm for interaction discovery and prediction (M. Yurochkin, X. Nguyen, N. Vasiloglou to appear in NIPS 2017). Code written by Mikhail Yurochkin.

## Overview

This is a demonstration of MiFM on Abalone data. 

First compile cython code in cython folder. On Ubuntu run:
```
cython g_c_sampler.pyx
python setup.py build_ext --inplace
```

It implemets Gibbs sampling updates and prediction function

prediction/predict_f_all.py Python wrapper for Cython code to aggregate MCMC samples for prediction

py_scripts/train_f.py Python wrapper for Cython code to run Gibbs sampling

py_scripts/py_functions.py Gibbs sampling for hyperpriors and initialization

mifm_class.py Implements MiFM class; data preprocessing; posterior analysis of interactions

abalone_example.py downloads Abalone dataset and shows how to use MiFM and extract interactions

Implementation is designed to be used in the interactive mode (e.g. Python IDE like Spyder).

## Usage guide

```
MiFM(K=5, J=50, it=700, lin_model=True, alpha=1., verbose=False, restart=5, restart_iter=50, thr=300, rate=25, ncores=1, use_mape=False)
```

Parameters:

K: rank of matrix of coefficients V

J: number of interactions (columns) in Z

it: number of Gibbs sampling iterations

lin_model: whether to include linear effects (w_1,...,w_D)

alpha: FFM_alpha parameter. Smaller values encourage deeper interactions

verbose: whether to print intermediate RMSE train scores

restart and restart_iter: how many initializations to try with restart_iter iterations each. Then best initialization based on training RMSE is used for fitting

ncores: how many cores to use for initialization with restarts

use_mape: whether to use AMAPE instead of RMSE to select best initialization

thr: number of MCMC iterations after which samples are collected (i.e. burn-in)

rate: each rate iteration is saved

Methods:
```
fit(X, y, cat_to_v, v_to_cat)
```

X: training data after one-hot encoding

y: response

cat_to_v: list of category to value after one-hot encoding (see example with Abalone data)

v_to_cat: dictionary of category to values before one-hot encoding (see example with Abalone data)

Returns list of MCMC samples. Each sample is a list [bias, linear coefficients, V, Z]

```
predict(self, X)
```
Note: can only be used on fitted object. 
Returns predicted values for testing data X using Monte Carlo estimator of the mean response.

```
score(self, X, y)
```
Makes predictions and computes RMSE or AMAPE
back to top