# Testing framework "pipeline"

_Alex Malz (NYU)_

In [None]:
import sklearn as skl
from sklearn import metrics

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import proclam
from proclam import *

## 1. True classes

Import the `proclam.Simulator` superclass and the simulator subclass you want to test.  In this notebook, I'm going to use an unbalanced distribution of true classes such that the probability of an object being in class $m$ (with $0 \leq m \leq M$) is proportional to $10^{y}$, where $y$ is a draw from a uniform distribution $U(0,M)$.

In [None]:
from proclam.simulators import simulator
from proclam.simulators import logunbalanced

Create an instantiation of the simulator for the true dataset class distribution scheme.  If you use the base superclass instead of a subclass, the default scheme will be to assign all objects the same class.

In [None]:
A = proclam.simulators.logunbalanced.LogUnbalanced()

Then, simulate a truth catalog.  In this case, there are 3 true classes and 100 objects in the catalog.  The output will be a `numpy.ndarray` with 100 entries, each of which is the index of the class for that catalog member.

In [None]:
A = proclam.simulators.logunbalanced.LogUnbalanced()
M_classes = 3
N_objects = 1000
truth = A.simulate(M_classes, N_objects)

We can check that the class distribution is as expected with a histogram of the true classes.

In [None]:
plt.hist(truth, log=True)
plt.ylabel('counts')
plt.xlabel('class number')

## 2. Mock classifications

Import the `proclam.Classifier` superclass and the classifier subclass you want to test.  In this notebook, I'm going to use a very stupid classifier that takes a random guess.

In [None]:
from proclam.classifiers import classifier
from proclam.classifiers import guess

Create an instantiation of the classifier of a particular scheme.  If you use the base superclass instead of a subclass, the default classification scheme will return the true classes.

Then, "classify" the "data."  By default, classification results will also include an extra column for "other" classes beyond the number in the training set, but in this example let's assume it knows of the 3 classes in the training set without leaving room for additional classes.  The output will be a `numpy.ndarray` with 100 rows and column entries representing each catalog member's posterior probability for being of each class.

In [None]:
B = proclam.classifiers.guess.Guess()
prediction = B.classify(M_classes, truth, other=False)
print(prediction)

## 3. Metric values

Import the `proclam.Metric` superclass and the metric subclass you want to test.

In [None]:
from proclam.metrics import metric
from proclam.metrics import logloss
from proclam.metrics import brier

Let's try out the [logloss metric](https://en.wikipedia.org/wiki/Loss_functions_for_classification#Cross_entropy_loss).  The logloss metric is a sum over $LL_{ij} = -y_{ij}\ln[p_{ij}]$ for predicted probabilities $p$ and true class indicators $y$.

In [None]:
C = proclam.metrics.logloss.LogLoss()
performance = C.evaluate(prediction, truth, averaging='per_item')
print(performance)

alternative = skl.metrics.log_loss(truth, prediction, normalize=True)
print(alternative)

Let's try the [Brier score](https://en.wikipedia.org/wiki/Brier_score#Original_definition_by_Brier) this time.  The multi-class Brier score is $BS = \frac{1}{N}\sum\limits _{t=1}^{N}\sum\limits _{i=1}^{R}(f_{ti}-o_{ti})^2$ for $N$ objects, $R$ classes, predicted probabilities $f$, and $o_{i}$ of 1 for true class $i$ and 0 for other true classes.

First we create an instantiation of the metric.  Then, we calculate the metric value.  For binary classes, we can compare to the implementation in `scikit-learn`.

In [None]:
D = proclam.metrics.brier.Brier()
performance = D.evaluate(prediction, truth)
print(performance)

if M_classes == 2:
    skl_truth = proclam.metrics.util.truth_reformatter(truth).T[0]
    alternative = skl.metrics.brier_score_loss(skl_truth, prediction.T[0])
    print(alternative)

# Next steps

Once there are more simulators, classifiers, and metrics, we'll loop over tests and plot comparisons.  Stay tuned for more!