Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

Revision 1db48f3a735eb0fba06a7d503f080a7ead512604 authored by Artem Artemev on 11 July 2018, 12:50:44 UTC, committed by GitHub on 11 July 2018, 12:50:44 UTC
Update version.py file to 1.2.0 (#812)
1 parent 707b195
  • Files
  • Changes
  • 2109064
  • /
  • doc
  • /
  • source
  • /
  • notebooks
  • /
  • vgp_notes.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
revision badge
swh:1:rev:1db48f3a735eb0fba06a7d503f080a7ead512604
directory badge
swh:1:dir:5e7ecf0b802d3e60cfac66660aeb4416a50ef469
content badge
swh:1:cnt:7a6e5aca88ec746b65344041a24ae0e3616632e8

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
vgp_notes.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "VGP Notes\n",
    "---\n",
    "*James Hensman 2016*\n",
    "\n",
    "Here are some implementation notes on the variational Gaussian approximation, `gpflow.models.VGP`. The reference for this work is [Opper and Archambeau 2009, *The variational Gaussian approximation revisited*](http://www.mitpressjournals.org/doi/abs/10.1162/neco.2008.08-07-592); these notes serve to map the conclusions of that paper to the implementation in GPflow. I'll give derivations for the expressions that are implemented in the VGP object. \n",
    "\n",
    "Two things are not covered by this notebook: prior mean functions and the extension to multiple independent outputs. These extensions are straightforward in theory but we have taken some care in the code to ensure that they are handled efficiently. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimal distribution\n",
    "The key result in the work of Opper and Archambeau is that for a Gaussian process with a non-Gaussian likelihood, the optimial Gaussian approximation (in the KL sense) is given by\n",
    "\n",
    "$$\n",
    "\\hat q(\\mathbf f) = \\mathcal N\\left(\\mathbf m, [\\mathbf K^{-1} + \\textrm{diag}(\\boldsymbol \\lambda)]^{-1}\\right)\\,.\n",
    "$$\n",
    "\n",
    "We follow their advice in reparameterising the mean as\n",
    "$$\n",
    "\\mathbf m = \\mathbf K \\boldsymbol \\alpha\n",
    "$$\n",
    "\n",
    "and additionally, to avoid having to constrain the parameter $\\lambda$ to be positive, take the square. The approximation then becomes\n",
    "\n",
    "$$\n",
    "\\hat q(\\mathbf f) = \\mathcal N\\left(\\mathbf K \\boldsymbol \\alpha, [\\mathbf K^{-1} + \\textrm{diag}(\\boldsymbol \\lambda)^2]^{-1}\\right)\\,.\n",
    "$$\n",
    "\n",
    "The ELBO is\n",
    "\n",
    "$$\n",
    "\\textrm{ELBO} = \\sum_n\\mathbb E_{q(f_n)}\\left[ \\log p(y_n\\,|\\,f_n)\\right] - \\textrm{KL}\\left[q(\\mathbf f)||p(\\mathbf f)\\right]\n",
    "$$\n",
    "\n",
    "We'll split the rest of this document into considering two terms: the marginals of $q(f)$ and the KL term. Given these, it is straight-forward to compute the ELBO: GPflow uses quadrature to compute 1D expectations where no closed form is available."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Marginals of $q(\\mathbf f)$\n",
    "Given the above form for $q(\\mathbf f)$, what is a quick and stable way to compute the marginals of this Gaussian? The means are trivial, but the variance could do with some work in order to avoid having to perform two matrix inverses. \n",
    "\n",
    "Let $\\boldsymbol \\Lambda = \\textrm{diag}(\\boldsymbol \\lambda)$ and $\\boldsymbol \\Sigma$ be the covariance in question;  $\\boldsymbol \\Sigma = [\\mathbf K^{-1} + \\boldsymbol \\Lambda^2]^{-1}$ By the matrix inversion lemma we have \n",
    "\n",
    "$$\n",
    "\\boldsymbol \\Sigma = [\\mathbf K^{-1} + \\boldsymbol \\Lambda^2]^{-1}\n",
    "$$\n",
    "\n",
    "$$\n",
    "= \\boldsymbol \\Lambda^{-2} - \\boldsymbol \\Lambda^{-2}[\\mathbf K + \\boldsymbol \\Lambda^{-2}]^{-1}\\boldsymbol \\Lambda^{-2}\n",
    "$$\n",
    "\n",
    "$$\n",
    "= \\boldsymbol \\Lambda^{-2} - \\boldsymbol \\Lambda^{-1}\\mathbf A^{-1}\\boldsymbol \\Lambda^{-1}\n",
    "$$\n",
    "\n",
    "with $\\mathbf A = \\boldsymbol \\Lambda\\mathbf K \\boldsymbol \\Lambda + \\mathbf I\\,.$\n",
    "\n",
    "Working with this form means that only one matrix decomposition is needed, and taking the cholesky factor of $\\mathbf A$ should be numerically stable since the eigenvalues are bounded by 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### KL divergence\n",
    "The KL divergence term will benefit from a similar re-organisation to the above. The KL is\n",
    "$$\n",
    "\\textrm{KL} = -0.5 \\log |\\boldsymbol \\Sigma| + 0.5 \\log |\\mathbf K| +0.5\\mathbf m^\\top\\mathbf K^{-1}\\mathbf m + 0.5\\textrm{tr}(\\mathbf K^{-1} \\boldsymbol \\Sigma) - 0.5 N\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where N is the number of data. Recalling our parameterization $\\boldsymbol \\alpha$ and combining like terms, \n",
    "$$\n",
    "\\textrm{KL} = 0.5 (-\\log |\\mathbf K^{-1}\\boldsymbol \\Sigma | +\\boldsymbol \\alpha^\\top\\mathbf K\\boldsymbol \\alpha + \\textrm{tr}(\\mathbf K^{-1} \\boldsymbol \\Sigma) - N)\\,.\n",
    "$$\n",
    "\n",
    "With a little manipulation it's possible to show that $\\textrm{tr}(\\mathbf K^{-1} \\boldsymbol \\Sigma) = \\textrm{tr}(\\mathbf A^{-1})$ and $|\\mathbf K^{-1} \\boldsymbol \\Sigma| = |\\mathbf A^{-1}|$, giving the final expression\n",
    "$$\n",
    "\\textrm{KL} = 0.5 (\\log |\\mathbf A| +\\boldsymbol \\alpha^\\top\\mathbf K\\boldsymbol \\alpha + \\textrm{tr}(\\mathbf A^{-1}) - N)\\,.\n",
    "$$\n",
    "\n",
    "This expression is not completely ideal because we have to compute the diagonal elements of $\\mathbf A^{-1}$. We do this with an extra backsubstitution (into the identity matrix), although it may be possible to do this faster in theory (not in tensorflow, to my knowledge). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction\n",
    "To make predictions with the Gaussian approximation, we need to integrate:\n",
    "$$\n",
    "q(f^\\star \\,|\\,\\mathbf y) = \\int p(f^\\star \\,|\\, \\mathbf f)q(\\mathbf f)\\,\\textrm d \\mathbf f\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The integral is a Gaussian one, and follows straightforwardly. Substituting the equations for these quantities:\n",
    "$$\n",
    "q(f^\\star \\,|\\,\\mathbf y) = \\int \\mathcal N(f^\\star \\,|\\, \\mathbf K_{\\star \\mathbf f}\\mathbf K^{-1}\\mathbf f,\\, \\mathbf K_{\\star \\star} - \\mathbf K_{\\star \\mathbf f}\\mathbf K^{-1}\\mathbf K_{\\mathbf f \\star})\\mathcal N (\\mathbf f\\,|\\, \\mathbf K \\boldsymbol\\alpha, \\boldsymbol \\Sigma)\\,\\textrm d \\mathbf f\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "q(f^\\star \\,|\\,\\mathbf y) = \\mathcal N\\left(f^\\star \\,|\\, \\mathbf K_{\\star \\mathbf f}\\boldsymbol \\alpha,\\, \\mathbf K_{\\star \\star} - \\mathbf K_{\\star \\mathbf f}(\\mathbf K^{-1} - \\mathbf K^{-1}\\boldsymbol \\Sigma\\mathbf K^{-1})\\mathbf K_{\\mathbf f \\star}\\right)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where the notation $\\mathbf K_{\\star \\mathbf f}$ means the covariance between the prediction points and the data points, and the matrix $\\mathbf K$ is shorthand for $\\mathbf K_{\\mathbf{ff}}$.\n",
    "\n",
    "The matrix $\\mathbf K^{-1} - \\mathbf K^{-1}\\boldsymbol \\Sigma\\mathbf K^{-1}$ can be expanded:\n",
    "$$\n",
    "\\mathbf K^{-1} - \\mathbf K^{-1}\\boldsymbol \\Sigma\\mathbf K^{-1} = \\mathbf K^{-1} - \\mathbf K^{-1}[\\mathbf K^{-1} + \\boldsymbol\\Lambda^2]^{-1}\\mathbf K^{-1}\\,,\n",
    "$$\n",
    "and simplified by recognising the form of the matrix inverse lemma:\n",
    "$$\n",
    "\\mathbf K^{-1} - \\mathbf K^{-1}\\boldsymbol \\Sigma\\mathbf K^{-1} = [\\mathbf K +  \\boldsymbol\\Lambda^2]^{-1}\\,.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This leads to the final expression for the prediction\n",
    "$$\n",
    "q(f^\\star \\,|\\,\\mathbf y) = \\mathcal N\\left(f^\\star \\,|\\, \\mathbf K_{\\star \\mathbf f}\\boldsymbol \\alpha,\\, \\mathbf K_{\\star \\star} - \\mathbf K_{\\star \\mathbf f}[\\mathbf K + \\boldsymbol \\Lambda^2]^{-1}\\mathbf K_{\\mathbf f \\star}\\right)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The VGP class has a little extra functionality to enable us to compute the marginal variance of the prediction when the full covariance matrix is not required."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API