Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

Revision 1db48f3a735eb0fba06a7d503f080a7ead512604 authored by Artem Artemev on 11 July 2018, 12:50:44 UTC, committed by GitHub on 11 July 2018, 12:50:44 UTC
Update version.py file to 1.2.0 (#812)
1 parent 707b195
  • Files
  • Changes
  • 2109064
  • /
  • doc
  • /
  • source
  • /
  • notebooks
  • /
  • SGPR_notes.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
revision badge
swh:1:rev:1db48f3a735eb0fba06a7d503f080a7ead512604
directory badge
swh:1:dir:5e7ecf0b802d3e60cfac66660aeb4416a50ef469
content badge
swh:1:cnt:ddcd9f306806784b00af1d647125fa2d2efdd2c2

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
SGPR_notes.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SGPR notes\n",
    "--\n",
    "*James Hensman, March 2016*\n",
    "*Small corrections Alex Matthews, December 2016*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The GPflow method SGPR performs Sparse Gaussian Process Regression. It is based mainly on Titsias (2009), though other works (Hensman et al. (2013), Matthews et. al. 2016) are useful for clarifying the prediction density.\n",
    "\n",
    "This notebook contains a derivation of the form of the equations used in GPflow for the marginal likelihood bound and for the prediction equations. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Marginal Likelihood bound\n",
    "The bound on the marginal likelihod is (Titsias 2009):\n",
    "$$ \\log p(\\mathbf y) \\geq \\log \\mathcal N(\\mathbf y\\,|\\,\\mathbf 0,\\, \\mathbf Q_{ff} + \\sigma^2 \\mathbf I) - \\tfrac{1}{2} \\sigma^{-2}\\textrm{tr}(\\mathbf K_{ff} - \\mathbf Q_{ff}) \\triangleq \\mathcal L$$\n",
    "with \n",
    "$$\\mathbf Q_{ff} = \\mathbf K_{fu}\\mathbf K_{uu}^{-1}\\mathbf K_{uf}$$.\n",
    "\n",
    "The kernel matrices $\\mathbf K_{ff}$, $\\mathbf K_{uu}$, $\\mathbf K_{fu}$ represent the kernel evaulated at the data points $\\mathbf X$, the inducing input points $\\mathbf Z$ and between the data and inducing points, respectively. We refer to the value of the GP at the data points $\\mathbf X$ as $\\mathbf f$, at the inducing points $\\mathbf Z$ as $\\mathbf u$ and at the remainder of the function as $f^\\star$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To obtain an efficient and stable evaluation os the bound $\\mathcal L$, we first apply the Woodbury identity to the effective covariance matrix:\n",
    "\n",
    "$$[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1} = \\sigma^{-2} \\mathbf I - \\sigma^{-4} \\mathbf K_{fu}[\\mathbf K_{uu} + \\mathbf K_{uf}\\mathbf K_{fu}\\sigma^{-2}]^{-1}\\mathbf K_{uf}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, to obtain a better conditioned matrix for inversion, we rotate by $\\mathbf L$, where $\\mathbf L\\mathbf L^\\top = \\mathbf K_{uu}$:\n",
    "$$[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1} = \\sigma^{-2} \\mathbf I - \\sigma^{-4} \\mathbf K_{fu}\\color{red}{\\mathbf L^{-\\top} \\mathbf L^\\top}[\\mathbf K_{uu} + \\mathbf K_{uf}K_{fu}\\sigma^{-2}]^{-1}\\color{red}{\\mathbf L \\mathbf L^{-1}}\\mathbf K_{uf}$$\n",
    "\n",
    "This matrix is better conditioned because for many kernels it has eigenvalues which are bounded above and below. For more detail see GPML section 3.4.3 ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\phantom{[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1}} = \\sigma^{-2} \\mathbf I - \\sigma^{-4} \\mathbf K_{fu}\\color{red}{\\mathbf L^{-\\top}} [\\color{red}{\\mathbf L^{-1}}\\mathbf (K_{uu} + \\mathbf K_{uf}K_{fu}\\sigma^{-2})\\color{red}{\\mathbf L^{-\\top}}]^{-1}\\color{red}{ \\mathbf L^{-1}}\\mathbf K_{uf}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\phantom{[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1} }= \\sigma^{-2} \\mathbf I - \\sigma^{-4} \\mathbf K_{fu}\\color{red}{\\mathbf L^{-\\top}} [\\mathbf I + \\color{red}{\\mathbf L^{-1}}\\mathbf (\\mathbf K_{uf}K_{fu})\\color{red}{\\mathbf L^{-\\top}}\\sigma^{-2}]^{-1}\\color{red}{ \\mathbf L^{-1}}\\mathbf K_{uf}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For notational convenience, we'll define $\\mathbf L^{-1}\\mathbf K_{uf}\\sigma^{-1} \\triangleq \\mathbf A$, and  $[\\mathbf I + \\mathbf A\\mathbf A^\\top]\\triangleq \\mathbf B$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\phantom{[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1} }= \\sigma^{-2} \\mathbf I - \\sigma^{-2} \\mathbf A^{\\top} [\\mathbf I + \\mathbf A\\mathbf A^\\top]^{-1}\\mathbf A\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\phantom{[\\mathbf Q_{ff} + \\sigma^2 \\mathbf I ]^{-1}}= \\sigma^{-2} \\mathbf I - \\sigma^{-2} \\mathbf A^{\\top} \\mathbf B^{-1}\\mathbf A\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also apply the [matrix determinant lemma](https://en.wikipedia.org/wiki/Matrix_determinant_lemma) to the same:\n",
    "$$\n",
    "|{\\mathbf Q_{ff}} + \\sigma^2 {\\mathbf I}| = |{\\mathbf K_{uu}} + \n",
    " \\mathbf K_{uf}\\mathbf K_{fu}\\sigma^{-2}| \\, |\\mathbf K_{uu}^{-1}| \\, |\\sigma^{2}\\mathbf I|\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Substituting $\\mathbf K_{uu} = {\\mathbf {L L}^\\top}$:\n",
    "$$\n",
    "|{\\mathbf Q_{ff}} + \\sigma^2 {\\mathbf I}| = |{\\mathbf {L L}^\\top} + \n",
    " \\mathbf K_{uf}\\mathbf K_{fu}\\sigma^{-2}| \\, |\\mathbf L^{-\\top}|\\,| \\mathbf L^{-1}| \\, |\\sigma^{2}\\mathbf I|\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "|{\\mathbf Q_{ff}} + \\sigma^2 {\\mathbf I}| = |\\mathbf I + \n",
    " \\mathbf L^{-1}\\mathbf K_{uf}\\mathbf K_{fu} \\mathbf L^{-\\top}\\sigma^{-2}| \\, |\\sigma^{2}\\mathbf I|\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "|{\\mathbf Q_{ff}} + \\sigma^2 {\\mathbf I}| = |\\mathbf B| \\, |\\sigma^{2}\\mathbf I|\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these two definitions, we're ready to expand the bound:\n",
    "$$ \\mathcal L = \\log \\mathcal N(\\mathbf y\\,|\\,\\mathbf 0,\\, \\mathbf Q_{ff} + \\sigma^2 \\mathbf I) - \\tfrac{1}{2} \\sigma^{-2}\\textrm{tr}(\\mathbf K_{ff} - \\mathbf Q_{ff}) $$\n",
    "\n",
    "$$ = -\\tfrac{N}{2}\\log{2\\pi} -\\tfrac{1}{2}\\log|\\mathbf Q_{ff}+\\sigma^2\\mathbf I| - \\tfrac{1}{2}\\mathbf y^\\top [ \\mathbf Q_{ff} + \\sigma^2 \\mathbf I]^{-1}\\mathbf y - \\tfrac{1}{2} \\sigma^{-2}\\textrm{tr}(\\mathbf K_{ff} - \\mathbf Q_{ff}) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "$$ = -\\tfrac{N}{2}\\log{2\\pi} -\\tfrac{1}{2}\\log(|\\mathbf B||\\sigma^{2}\\mathbf I|)  - \\tfrac{1}{2}\\sigma^{-2}\\mathbf y^\\top (\\mathbf I - \\sigma^{-2} \\mathbf A^{\\top} \\mathbf B^{-1}\\mathbf A)\\mathbf y - \\tfrac{1}{2} \\sigma^{-2}\\textrm{tr}(\\mathbf K_{ff} - \\mathbf Q_{ff}) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ = -\\tfrac{N}{2}\\log{2\\pi} \n",
    "-\\tfrac{1}{2}\\log|\\mathbf B|\n",
    "-\\tfrac{N}{2}\\log\\sigma^{2}\n",
    "-\\tfrac{1}{2}\\sigma^{-2}\\mathbf y^\\top\\mathbf y\n",
    "+\\tfrac{1}{2}\\sigma^{-2}\\mathbf y^\\top\\mathbf A^{\\top} \\mathbf B^{-1}\\mathbf A\\mathbf y\n",
    "-\\tfrac{1}{2}\\sigma^{-2}\\textrm{tr}(\\mathbf K_{ff})\n",
    "+ \\tfrac{1}{2}\\textrm{tr}(\\mathbf {AA}^\\top) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where we have used $\\sigma^{-2}\\textrm{tr}(\\mathbf Q) = \\textrm{tr}(\\mathbf {AA}^\\top)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we define $\\mathbf c \\triangleq \\mathbf L_{\\mathbf B}^{-1}\\mathbf A\\mathbf y \\sigma^{-1}$, with $\\mathbf {L_BL_B}^\\top = \\mathbf B$, so that:\n",
    "$$\n",
    "\\sigma^{-2}\\mathbf y^\\top\\mathbf A^{\\top} \\mathbf B^{-1}\\mathbf A\\mathbf y = \n",
    "\\mathbf c^\\top \\mathbf c\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The SGPR code implements this equation with small changes for multiple concurrent outputs (columns of the data matrix Y) and also a prior mean function. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At precition time, we need to compute the mean and variance of the variational approximation at some new points $\\mathbf X^\\star$. Following Hensman et al. (2013), we know that all the information in the posterior approximation is contained in the Gaussian distribution $q(\\mathbf u)$, which represents the distribution on function values at the inducing points $\\mathbf Z$. Recall:\n",
    "\n",
    "$$\n",
    "q(\\mathbf u) = \\mathcal N(\\mathbf u\\,|\\,  \\mathbf m, \\mathbf \\Lambda)\n",
    "$$\n",
    "\n",
    "with \n",
    "\n",
    "$$\\mathbf \\Lambda = \\mathbf K_{uu}^{-1} + \\mathbf K_{uu}^{-1}\\mathbf K_{uf}\\mathbf K_{fu}\\mathbf K_{uu}^{-1} \\sigma^{-2}\n",
    "$$\n",
    "$$\n",
    "\\mathbf m = \\mathbf \\Lambda^{-1} \\mathbf K_{uu}^{-1}\\mathbf K_{uf}\\mathbf y\\sigma^{-2}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make a prediction, we need to integrate:\n",
    "    $$ p(\\mathbf f^\\star) = \\int p(\\mathbf f^\\star \\,|\\, \\mathbf u) q (\\mathbf u) \\textrm d \\mathbf u\n",
    "    $$\n",
    "    with \n",
    "    $$p(\\mathbf f^\\star \\,|\\, \\mathbf u) = \\mathcal N(\\mathbf f^\\star\\,|\\, \\mathbf K_{\\star u}\\mathbf K_{uu}^{-1}\\mathbf u, \\,\\mathbf K_{\\star\\star} - \\mathbf K_{\\star u}\\mathbf K_{uu}^{-1}\\mathbf K_{u\\star})\n",
    "    $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The integral results in \n",
    "$$\n",
    "p(\\mathbf f^\\star) = \\mathcal N(\\mathbf f^\\star\\,|\\, \\mathbf K_{\\star u}\\mathbf K_{uu}^{-1}\\mathbf m, \\,\\mathbf K_{\\star\\star} - \\mathbf K_{\\star u}\\mathbf K_{uu}^{-1}\\mathbf K_{u\\star} + \\mathbf K_{\\star u}\\mathbf K_{uu}^{-1}\\mathbf \\Lambda^{-1}\\mathbf K_{uu}^{-1}\\mathbf K_{u\\star})\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note from our above definitions that we have\n",
    "$$\\mathbf K_{uu}^{-1}\\mathbf \\Lambda^{-1}\\mathbf K_{uu}^{-1} = \n",
    "\\mathbf L^{-\\top}\\mathbf B^{-1}\\mathbf L^{-1}$$\n",
    "\n",
    "and further\n",
    "$$\\mathbf K_{uu}^{-1}\\mathbf m = \\mathbf L^{-\\top}\\mathbf L_{\\mathbf B}^{-\\top}\\mathbf c$$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "substituting:\n",
    "$$\n",
    "p(\\mathbf f^\\star) = \\mathcal N(\\mathbf f^\\star\\,|\\, \\mathbf K_{\\star u}\\mathbf L^{-\\top}\\mathbf L_{\\mathbf B}^{-\\top}\\mathbf c, \\,\\mathbf K_{\\star\\star} - \\mathbf K_{\\star u}\\mathbf L^{-1}(\\mathbf I - \\mathbf B^{-1})\\mathbf L^{-1}\\mathbf K_{u\\star})\n",
    "$$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code in `sgpr.py` implements this equation, with an additional switch depending on whether the full covariance matrix is required."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API