Content - ca5b7fe815eed3b79fab794f1af57e6e2c02869b

Permalink
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_Alex Malz (NYU)_\n",
    "_(Add your name here when contributing.)_\n",
    "\n",
    "# PRObabilistic CLAssification Metrics\n",
    "\n",
    "This notebook explores the behavior of a number of classification metrics, drawn from [discussions](https://docs.google.com/document/d/1pg0KUY0KihjlWKwoCE7Fc29u9pjv-fhwUnL8o34s58k/edit#) in the context of PLAsTiCC."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many classification metrics are already implemented in [`scikit-learn`](http://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sklearn as skl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Receiver Operating Curve (ROC)\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)\n",
    "\n",
    "Pros\n",
    "* Works with multi-label data\n",
    "\n",
    "Cons\n",
    "* Doesn't naturally work with multi-class data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import roc_curve"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ROC Area Under Curve (AUC)\n",
    "\n",
    "Pros\n",
    "* Commonly used\n",
    "\n",
    "Cons\n",
    "* Not good for sparse classes\n",
    "* \"Noisy\" metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import roc_auc_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Standard Score (zROC)\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Standard_score)\n",
    "\n",
    "Pros\n",
    "\n",
    "Cons\n",
    "* Not implemented in `scikit-learn`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write it!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Detection Error Tradeoff (DET) Graph\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Detection_error_tradeoff)\n",
    "\n",
    "Pros\n",
    "* More sensitive to areas of interest than ROC\n",
    "\n",
    "Cons\n",
    "* Not implemented in `scikit-learn`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write it!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Log-Loss\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Loss_functions_for_classification)\n",
    "\n",
    "Pros\n",
    "* `scikit-learn` implementation works with multi-class data\n",
    "\n",
    "Cons\n",
    "* Doesn't naturally work with multi-class data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import log_loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Brier score\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Brier_score)\n",
    "\n",
    "Pros\n",
    "* Naturally works with multi-class data\n",
    "* Intuitively interpretable\n",
    "\n",
    "Cons\n",
    "* `scikit-learn` implementation only works with binary classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import brier_score_loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Precision-Recall Curve (PRC)\n",
    "\n",
    "[on Wikipedia](https://en.wikipedia.org/wiki/Precision_and_recall)\n",
    "\n",
    "Pros\n",
    "\n",
    "Cons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import precision_recall_curve"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PRC Area Under Curve (AUC)\n",
    "\n",
    "[not on Wikipedia](https://andybeger.com/2015/03/16/precision-recall-curves/)\n",
    "\n",
    "Pros\n",
    "* Better for sparse data than ROC AUC\n",
    "\n",
    "Cons\n",
    "* Doesn't naturally work with multi-class data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import auc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PRC Average Precision Score\n",
    "\n",
    "Pros\n",
    "* Less optimistic than PRC AUC\n",
    "\n",
    "Cons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import average_precision_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other modifications of deterministic metrics?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Impact of converting classifications from deterministic to probabilistic?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (not)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}