overview_1_the_basics.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Overview: The Basics\n",
"--------------------\n",
"\n",
"PyAutoFit is a Python based probabilistic programming language which enables detailed Bayesian analysis of scientific\n",
"datasets, with everything necessary to scale-up Bayesian analysis to complex models and big datasets.\n",
"\n",
"In this overview, we introduce the basic API for model-fitting with **PyAutoFit**, including how to define a likelihood\n",
"function, compose a probabilistic model, and fit it to data via a non-linear fitting algorithm (e.g.\n",
"Markov Chain Monta Carlo (MCMC), maximum likelihood estimator, nested sampling).\n",
"\n",
"If you have previous experience performing model-fitting and Bayesian inference this will all be very familiar, but\n",
"we'll highlight some benefits of using **PyAutoFit** instead of setting up the modeling manually yourself (e.g. by\n",
"wrapping an MCMC library with your likelihood function).\n",
"\n",
"The biggest benefits of using **PyAutoFit** are presented in the next two overviews after we've introduced the API,\n",
"but these can be summarized as follows:\n",
"\n",
"- The scientific workflow: streamline detailed modeling and analysis of small datasets with tools enabling the scaling\n",
" up to large datasets.\n",
"\n",
"- Statistical Inference Methods: Dedicated functionality for many advanced statical inference methods, including\n",
" Bayesian hierarchical analysis, model comparison and graphical models."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"%matplotlib inline\n",
"from pyprojroot import here\n",
"workspace_path = str(here())\n",
"%cd $workspace_path\n",
"print(f\"Working Directory has been set to `{workspace_path}`\")\n",
"\n",
"import autofit as af\n",
"import autofit.plot as aplt\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import os\n",
"from os import path"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Example Use Case__\n",
"\n",
"To illustrate **PyAutoFit** we'll use the example modeling problem of fitting a noisy 1D signal. \n",
"\n",
"We load the example 1D data containing this noisy signal below, which is included with the `autofit_workspace`\n",
"in .json files."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"dataset_path = path.join(\"dataset\", \"example_1d\", \"gaussian_x1\")\n",
"data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
"noise_map = af.util.numpy_array_from_json(\n",
" file_path=path.join(dataset_path, \"noise_map.json\")\n",
")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We plot the data containing the noisy 1D signal."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"xvalues = range(data.shape[0])\n",
"\n",
"plt.errorbar(\n",
" x=xvalues, y=data, yerr=noise_map, color=\"k\", ecolor=\"k\", elinewidth=1, capsize=2\n",
")\n",
"plt.title(\"Example Data\")\n",
"plt.xlabel(\"x values of data (pixels)\")\n",
"plt.ylabel(\"Signal Value\")\n",
"plt.show()\n",
"plt.close()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 1D signal was generated using a 1D Gaussian profile of the form:\n",
"\n",
"\\begin{equation*}\n",
"g(x, I, \\sigma) = \\frac{N}{\\sigma\\sqrt{2\\pi}} \\exp{(-0.5 (x / \\sigma)^2)}\n",
"\\end{equation*}\n",
"\n",
"Where:\n",
"\n",
"`x`: Is the x-axis coordinate where the `Gaussian` is evaluated.\n",
"\n",
"`N`: Describes the overall normalization of the Gaussian.\n",
"\n",
"$\\sigma$: Describes the size of the Gaussian (Full Width Half Maximum = $\\mathrm {FWHM}$ = $2{\\sqrt {2\\ln 2}}\\;\\sigma$)\n",
"\n",
"Our modeling task is to fit the signal with a 1D Gaussian and recover its parameters (`x`, `N`, `sigma`).\n",
"\n",
"__Model__\n",
"\n",
"We therefore need to define a 1D Gaussian as a \"model component\" in **PyAutoFit**.\n",
"\n",
"A model component is written as a Python class using the following format:\n",
"\n",
"- The name of the class is the name of the model component, in this case, \"Gaussian\".\n",
"\n",
"- The input arguments of the constructor (the `__init__` method) are the parameters of the model, in this case\n",
" `centre`, `normalization` and `sigma`.\n",
" \n",
"- The default values of the input arguments define whether a parameter is a single-valued `float` or a \n",
" multi-valued `tuple`. In this case, all 3 input parameters are floats.\n",
" \n",
"- It includes functions associated with that model component, which will be used when fitting the model to data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"\n",
"\n",
"class Gaussian:\n",
" def __init__(\n",
" self,\n",
" centre=0.0, # <- PyAutoFit recognises these constructor arguments\n",
" normalization=0.1, # <- are the Gaussian`s model parameters.\n",
" sigma=0.01,\n",
" ):\n",
" \"\"\"\n",
" Represents a 1D `Gaussian` profile, which can be treated as a PyAutoFit model-component whose free\n",
" parameters (centre, normalization and sigma) are fitted for by a non-linear search.\n",
"\n",
" Parameters\n",
" ----------\n",
" centre\n",
" The x coordinate of the profile centre.\n",
" normalization\n",
" Overall normalization of the `Gaussian` profile.\n",
" sigma\n",
" The sigma value controlling the size of the Gaussian.\n",
" \"\"\"\n",
" self.centre = centre\n",
" self.normalization = normalization\n",
" self.sigma = sigma\n",
"\n",
" def model_data_1d_via_xvalues_from(self, xvalues: np.ndarray) -> np.ndarray:\n",
" \"\"\"\n",
" Returns the 1D Gaussian profile on a line of Cartesian x coordinates.\n",
"\n",
" The input xvalues are translated to a coordinate system centred on the Gaussian, by subtracting its centre.\n",
"\n",
" The output is referred to as the `model_data` to signify that it is a representation of the data from the\n",
" model.\n",
"\n",
" Parameters\n",
" ----------\n",
" xvalues\n",
" The x coordinates for which the Gaussian is evaluated.\n",
" \"\"\"\n",
" transformed_xvalues = xvalues - self.centre\n",
"\n",
" return np.multiply(\n",
" np.divide(self.normalization, self.sigma * np.sqrt(2.0 * np.pi)),\n",
" np.exp(-0.5 * np.square(np.divide(transformed_xvalues, self.sigma))),\n",
" )\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To compose a model using the `Gaussian` class above we use the `af.Model` object."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model = af.Model(Gaussian)\n",
"print(\"Model `Gaussian` object: \\n\")\n",
"print(model)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model has a total of 3 parameters:"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.total_free_parameters)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All model information is given by printing its `info` attribute.\n",
"\n",
"This shows that each model parameter has an associated prior.\n",
"\n",
"[The `info` below may not display optimally on your computer screen, for example the whitespace between parameter\n",
"names on the left and parameter priors on the right may lead them to appear across multiple lines. This is a\n",
"common issue in Jupyter notebooks.\n",
"\n",
"The`info_whitespace_length` parameter in the file `config/general.yaml` in the [output] section can be changed to \n",
"increase or decrease the amount of whitespace (The Jupyter notebook kernel will need to be reset for this change to \n",
"appear in a notebook).]"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.info)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The priors can be manually altered as follows, noting that these updated files will be used below when we fit the\n",
"model to data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model.centre = af.UniformPrior(lower_limit=0.0, upper_limit=100.0)\n",
"model.normalization = af.UniformPrior(lower_limit=0.0, upper_limit=1e2)\n",
"model.sigma = af.UniformPrior(lower_limit=0.0, upper_limit=30.0)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Printing the `model.info` displayed these updated priors."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.info)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Instances__\n",
"\n",
"Instances of the model components above (created via `af.Model`) can be created, where an input `vector` of\n",
"parameters is mapped to create an instance of the Python class of the model.\n",
"\n",
"We first need to know the order of parameters in the model, so we know how to define the input `vector`. This\n",
"information is contained in the models `paths` attribute:"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.paths)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We input values for the 3 free parameters of our model following the order of paths above:\n",
" \n",
" 1) `centre=30.0`\n",
" 2) `normalization=2.0` \n",
" 3) `sigma=3.0` \n",
" \n",
"This creates an `instance` of the Gaussian class via the model. "
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"instance = model.instance_from_vector(vector=[30.0, 2.0, 3.0])"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an instance of the `Gaussian` class."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(\"Model Instance: \\n\")\n",
"print(instance)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It has the parameters of the `Gaussian` with the values input above."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(\"Instance Parameters \\n\")\n",
"print(\"x = \", instance.centre)\n",
"print(\"normalization = \", instance.normalization)\n",
"print(\"sigma = \", instance.sigma)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use functions associated with the class, specifically the `model_data_1d_via_xvalues_from` function, to \n",
"create a realization of the `Gaussian` and plot it."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"xvalues = np.arange(0.0, 100.0, 1.0)\n",
"\n",
"model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
"\n",
"plt.plot(xvalues, model_data, color=\"r\")\n",
"plt.title(\"1D Gaussian Model Data.\")\n",
"plt.xlabel(\"x values of profile\")\n",
"plt.ylabel(\"Gaussian Value\")\n",
"plt.show()\n",
"plt.clf()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This \"model mapping\", whereby models map to an instances of their Python classes, is integral to the core **PyAutoFit**\n",
"API for model composition and fitting.\n",
"\n",
"__Analysis__\n",
"\n",
"Now we've defined our model, we need to inform **PyAutoFit** how to fit it to data.\n",
"\n",
"We therefore define an `Analysis` class, which includes:\n",
"\n",
" - An `__init__` constructor, which takes as input the `data` and `noise_map`. This could be extended to include \n",
" anything else necessary to fit the model to the data.\n",
"\n",
" - A `log_likelihood_function`, which defines how given an `instance` of the model we fit it to the data and return a \n",
" log likelihood value.\n",
"\n",
"Read the comments and docstrings of the `Analysis` object below in detail for more insights into how this object\n",
"works."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"\n",
"\n",
"class Analysis(af.Analysis):\n",
" def __init__(self, data: np.ndarray, noise_map: np.ndarray):\n",
" \"\"\"\n",
" The `Analysis` class acts as an interface between the data and model in **PyAutoFit**.\n",
"\n",
" Its `log_likelihood_function` defines how the model is fitted to the data and it is called many times by\n",
" the non-linear search fitting algorithm.\n",
"\n",
" In this example the `Analysis` `__init__` constructor only contains the `data` and `noise-map`, but it can be\n",
" easily extended to include other quantities.\n",
"\n",
" Parameters\n",
" ----------\n",
" data\n",
" A 1D numpy array containing the data (e.g. a noisy 1D signal) fitted in the workspace examples.\n",
" noise_map\n",
" A 1D numpy array containing the noise values of the data, used for computing the goodness of fit\n",
" metric, the log likelihood.\n",
" \"\"\"\n",
" super().__init__()\n",
"\n",
" self.data = data\n",
" self.noise_map = noise_map\n",
"\n",
" def log_likelihood_function(self, instance) -> float:\n",
" \"\"\"\n",
" Returns the log likelihood of a fit of a 1D Gaussian to the dataset.\n",
"\n",
" The data is fitted using an `instance` of the `Gaussian` class where its `model_data_1d_via_xvalues_from`\n",
" is called in order to create a model data representation of the Gaussian that is fitted to the data.\n",
" \"\"\"\n",
"\n",
" \"\"\"\n",
" The `instance` that comes into this method is an instance of the `Gaussian` model above, which was created\n",
" via `af.Model()`. \n",
"\n",
" The parameter values are chosen by the non-linear search, based on where it thinks the high likelihood regions \n",
" of parameter space are.\n",
"\n",
" The lines of Python code are commented out below to prevent excessive print statements when we run the\n",
" non-linear search, but feel free to uncomment them and run the search to see the parameters of every instance\n",
" that it fits.\n",
" \"\"\"\n",
"\n",
" # print(\"Gaussian Instance:\")\n",
" # print(\"Centre = \", instance.centre)\n",
" # print(\"Normalization = \", instance.normalization)\n",
" # print(\"Sigma = \", instance.sigma)\n",
"\n",
" \"\"\"\n",
" Get the range of x-values the data is defined on, to evaluate the model of the Gaussian.\n",
" \"\"\"\n",
" xvalues = np.arange(self.data.shape[0])\n",
"\n",
" \"\"\"\n",
" Use these xvalues to create model data of our Gaussian.\n",
" \"\"\"\n",
" model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
"\n",
" \"\"\"\n",
" Fit the model gaussian line data to the observed data, computing the residuals, chi-squared and log likelihood.\n",
" \"\"\"\n",
" residual_map = self.data - model_data\n",
" chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
" chi_squared = sum(chi_squared_map)\n",
" noise_normalization = np.sum(np.log(2 * np.pi * self.noise_map**2.0))\n",
" log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
"\n",
" return log_likelihood\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create an instance of the `Analysis` class by passing the `data` and `noise_map`."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"analysis = Analysis(data=data, noise_map=noise_map)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Non Linear Search__\n",
"\n",
"We have defined the model that we want to fit the data, and the analysis class that performs this fit.\n",
"\n",
"We now choose our fitting algorithm, called the \"non-linear search\", and fit the model to the data.\n",
"\n",
"For this example, we choose the nested sampling algorithm Dynesty. A wide variety of non-linear searches are \n",
"available in **PyAutoFit** (see ?)."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"search = af.DynestyStatic(\n",
" nlive=100,\n",
" sample=\"rwalk\",\n",
" number_of_cores=1,\n",
")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Model Fit__\n",
"\n",
"We begin the non-linear search by calling its `fit` method. \n",
"\n",
"This will take a minute or so to run."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(\n",
" \"\"\"\n",
" The non-linear search has begun running.\n",
" This Jupyter notebook cell with progress once the search has completed - this could take a few minutes!\n",
" \"\"\"\n",
")\n",
"\n",
"result = search.fit(model=model, analysis=analysis)\n",
"\n",
"print(\"The search has finished run - you may now continue the notebook.\")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Result__\n",
"\n",
"The result object returned by the fit provides information on the results of the non-linear search. \n",
"\n",
"The `info` attribute shows the result in a readable format.\n",
"\n",
"[Above, we discussed that the `info_whitespace_length` parameter in the config files could b changed to make \n",
"the `model.info` attribute display optimally on your computer. This attribute also controls the whitespace of the\n",
"`result.info` attribute.]"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(result.info)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Results are returned as instances of the model, as we illustrated above in the model mapping section.\n",
"\n",
"For example, we can print the result's maximum likelihood instance."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(result.max_log_likelihood_instance)\n",
"\n",
"print(\"\\n Model-fit Max Log-likelihood Parameter Estimates: \\n\")\n",
"print(\"Centre = \", result.max_log_likelihood_instance.centre)\n",
"print(\"Normalization = \", result.max_log_likelihood_instance.normalization)\n",
"print(\"Sigma = \", result.max_log_likelihood_instance.sigma)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A benefit of the result being an instance is that we can use any of its methods to inspect the results.\n",
"\n",
"Below, we use the maximum likelihood instance to compare the maximum likelihood `Gaussian` to the data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model_data = result.max_log_likelihood_instance.model_data_1d_via_xvalues_from(\n",
" xvalues=np.arange(data.shape[0])\n",
")\n",
"\n",
"plt.errorbar(\n",
" x=xvalues, y=data, yerr=noise_map, color=\"k\", ecolor=\"k\", elinewidth=1, capsize=2\n",
")\n",
"plt.plot(xvalues, model_data, color=\"r\")\n",
"plt.title(\"Dynesty model fit to 1D Gaussian dataset.\")\n",
"plt.xlabel(\"x values of profile\")\n",
"plt.ylabel(\"Profile normalization\")\n",
"plt.show()\n",
"plt.close()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Samples__\n",
"\n",
"The results object also contains a `Samples` object, which contains all information on the non-linear search.\n",
"\n",
"This includes parameter samples, log likelihood values, posterior information and results internal to the specific\n",
"algorithm (e.g. the internal dynesty samples).\n",
"\n",
"This is described fully in the results overview, below we use the samples to plot the probability density function\n",
"cornerplot of the results."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"search_plotter = aplt.DynestyPlotter(samples=result.samples)\n",
"search_plotter.cornerplot()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Extending Models__\n",
"\n",
"The model composition API is designed to make composing complex models, consisting of multiple components with many \n",
"free parameters, straightforward and scalable.\n",
"\n",
"To illustrate this, we will extend our model to include a second component, representing a symmetric 1D Exponential\n",
"profile, and fit it to data generated with both profiles.\n",
"\n",
"Lets begin by loading and plotting this data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"dataset_path = path.join(\"dataset\", \"example_1d\", \"gaussian_x1__exponential_x1\")\n",
"data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
"noise_map = af.util.numpy_array_from_json(\n",
" file_path=path.join(dataset_path, \"noise_map.json\")\n",
")\n",
"xvalues = range(data.shape[0])\n",
"plt.errorbar(\n",
" x=xvalues, y=data, yerr=noise_map, color=\"k\", ecolor=\"k\", elinewidth=1, capsize=2\n",
")\n",
"plt.title(\"Example Data With Multiple Components\")\n",
"plt.xlabel(\"x values of data (pixels)\")\n",
"plt.ylabel(\"Signal Value\")\n",
"plt.show()\n",
"plt.close()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We define a Python class for the `Exponential` model component, exactly as we did for the `Gaussian` above."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"\n",
"\n",
"class Exponential:\n",
" def __init__(\n",
" self,\n",
" centre=30.0, # <- **PyAutoFit** recognises these constructor arguments\n",
" normalization=1.0, # <- are the Exponentials`s model parameters.\n",
" rate=0.01,\n",
" ):\n",
" \"\"\"\n",
" Represents a symmetric 1D Exponential profile.\n",
"\n",
" Parameters\n",
" ----------\n",
" centre\n",
" The x coordinate of the profile centre.\n",
" normalization\n",
" Overall normalization of the profile.\n",
" ratw\n",
" The decay rate controlling has fast the Exponential declines.\n",
" \"\"\"\n",
" self.centre = centre\n",
" self.normalization = normalization\n",
" self.rate = rate\n",
"\n",
" def model_data_1d_via_xvalues_from(self, xvalues: np.ndarray):\n",
" \"\"\"\n",
" Returns the symmetric 1D Exponential on an input list of Cartesian x coordinates.\n",
"\n",
" The input xvalues are translated to a coordinate system centred on the Gaussian, via its `centre`.\n",
"\n",
" The output is referred to as the `model_data` to signify that it is a representation of the data from the\n",
" model.\n",
"\n",
" Parameters\n",
" ----------\n",
" xvalues\n",
" The x coordinates in the original reference frame of the data.\n",
" \"\"\"\n",
" transformed_xvalues = np.subtract(xvalues, self.centre)\n",
" return self.normalization * np.multiply(\n",
" self.rate, np.exp(-1.0 * self.rate * abs(transformed_xvalues))\n",
" )\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can easily compose a model consisting of 1 `Gaussian` object and 1 `Exponential` object using the `af.Collection`\n",
"object:"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model = af.Collection(gaussian=af.Model(Gaussian), exponential=af.Model(Exponential))"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `Collection` behaves analogous to a `Model`, but it contains a multiple model components.\n",
"\n",
"We can see this by printing its `paths` attribute, where paths to all 6 free parameters via both model components\n",
"are shown.\n",
"\n",
"The paths have the entries `.gaussian.` and `.exponential.`, which correspond to the names we input into \n",
"the `af.Collection` above. "
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.paths)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the paths to customize the priors of each parameter."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"model.gaussian.centre = af.UniformPrior(lower_limit=0.0, upper_limit=100.0)\n",
"model.gaussian.normalization = af.UniformPrior(lower_limit=0.0, upper_limit=1e2)\n",
"model.gaussian.sigma = af.UniformPrior(lower_limit=0.0, upper_limit=30.0)\n",
"model.exponential.centre = af.UniformPrior(lower_limit=0.0, upper_limit=100.0)\n",
"model.exponential.normalization = af.UniformPrior(lower_limit=0.0, upper_limit=1e2)\n",
"model.exponential.rate = af.UniformPrior(lower_limit=0.0, upper_limit=10.0)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All of the information about the model created via the collection can be printed at once using its `info` attribute:"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(model.info)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A model instance can again be created by mapping an input `vector`, which now has 6 entries."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"instance = model.instance_from_vector(vector=[0.1, 0.2, 0.3, 0.4, 0.5, 0.01])"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This `instance` contains each of the model components we defined above. \n",
"\n",
"The argument names input into the `Collection` define the attribute names of the `instance`:"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(\"Instance Parameters \\n\")\n",
"print(\"x (Gaussian) = \", instance.gaussian.centre)\n",
"print(\"normalization (Gaussian) = \", instance.gaussian.normalization)\n",
"print(\"sigma (Gaussian) = \", instance.gaussian.sigma)\n",
"print(\"x (Exponential) = \", instance.exponential.centre)\n",
"print(\"normalization (Exponential) = \", instance.exponential.normalization)\n",
"print(\"sigma (Exponential) = \", instance.exponential.rate)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `Analysis` class above assumed the `instance` contained only a single model-component.\n",
"\n",
"We update its `log_likelihood_function` to use both model components in the `instance` to fit the data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"\n",
"\n",
"class Analysis(af.Analysis):\n",
" def __init__(self, data: np.ndarray, noise_map: np.ndarray):\n",
" \"\"\"\n",
" The `Analysis` class acts as an interface between the data and model in **PyAutoFit**.\n",
"\n",
" Its `log_likelihood_function` defines how the model is fitted to the data and it is called many times by\n",
" the non-linear search fitting algorithm.\n",
"\n",
" In this example the `Analysis` `__init__` constructor only contains the `data` and `noise-map`, but it can be\n",
" easily extended to include other quantities.\n",
"\n",
" Parameters\n",
" ----------\n",
" data\n",
" A 1D numpy array containing the data (e.g. a noisy 1D signal) fitted in the workspace examples.\n",
" noise_map\n",
" A 1D numpy array containing the noise values of the data, used for computing the goodness of fit\n",
" metric, the log likelihood.\n",
" \"\"\"\n",
" super().__init__()\n",
"\n",
" self.data = data\n",
" self.noise_map = noise_map\n",
"\n",
" def log_likelihood_function(self, instance) -> float:\n",
" \"\"\"\n",
" Returns the log likelihood of a fit of a 1D Gaussian to the dataset.\n",
"\n",
" The data is fitted using an `instance` of multiple 1D profiles (e.g. a `Gaussian`, `Exponential`) where\n",
" their `model_data_1d_via_xvalues_from` methods are called and sumed in order to create a model data\n",
" representation that is fitted to the data.\n",
" \"\"\"\n",
"\n",
" \"\"\"\n",
" The `instance` that comes into this method is an instance of the `Gaussian` and `Exponential` models above, \n",
" which were created via `af.Collection()`. \n",
" \n",
" It contains instances of every class we instantiated it with, where each instance is named following the names\n",
" given to the Collection, which in this example is a `Gaussian` (with name `gaussian) and Exponential (with \n",
" name `exponential`).\n",
" \n",
" The parameter values are again chosen by the non-linear search, based on where it thinks the high likelihood \n",
" regions of parameter space are. The lines of Python code are commented out below to prevent excessive print \n",
" statements. \n",
" \"\"\"\n",
"\n",
" # print(\"Gaussian Instance:\")\n",
" # print(\"Centre = \", instance.gaussian.centre)\n",
" # print(\"Normalization = \", instance.gaussian.normalization)\n",
" # print(\"Sigma = \", instance.gaussian.sigma)\n",
"\n",
" # print(\"Exponential Instance:\")\n",
" # print(\"Centre = \", instance.exponential.centre)\n",
" # print(\"Normalization = \", instance.exponential.normalization)\n",
" # print(\"Rate = \", instance.exponential.rate)\n",
"\n",
" \"\"\"\n",
" Get the range of x-values the data is defined on, to evaluate the model of the Gaussian.\n",
" \"\"\"\n",
" xvalues = np.arange(self.data.shape[0])\n",
"\n",
" \"\"\"\n",
" Internally, the `instance` variable is a list of all model components pass to the `Collection` above.\n",
" \n",
" we can therefore iterate over them and use their `model_data_1d_via_xvalues_from` methods to create the\n",
" summed overall model data.\n",
" \"\"\"\n",
" model_data = sum(\n",
" [\n",
" profile_1d.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
" for profile_1d in instance\n",
" ]\n",
" )\n",
"\n",
" \"\"\"\n",
" Fit the model gaussian line data to the observed data, computing the residuals, chi-squared and log likelihood.\n",
" \"\"\"\n",
" residual_map = self.data - model_data\n",
" chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
" chi_squared = sum(chi_squared_map)\n",
" noise_normalization = np.sum(np.log(2 * np.pi * noise_map**2.0))\n",
" log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
"\n",
" return log_likelihood\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now fit this model to the data using the same API we did before."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"analysis = Analysis(data=data, noise_map=noise_map)\n",
"\n",
"search = af.DynestyStatic(\n",
" nlive=100,\n",
" sample=\"rwalk\",\n",
" number_of_cores=1,\n",
")\n",
"\n",
"result = search.fit(model=model, analysis=analysis)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `info` attribute shows the result in a readable format, showing that all 6 free parameters were fitted for."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"print(result.info)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can again use the max log likelihood instance to visualize the model data of the best fit model compared to the\n",
"data."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"instance = result.max_log_likelihood_instance\n",
"\n",
"model_gaussian = instance.gaussian.model_data_1d_via_xvalues_from(\n",
" xvalues=np.arange(data.shape[0])\n",
")\n",
"model_exponential = instance.exponential.model_data_1d_via_xvalues_from(\n",
" xvalues=np.arange(data.shape[0])\n",
")\n",
"model_data = model_gaussian + model_exponential\n",
"\n",
"plt.errorbar(\n",
" x=xvalues, y=data, yerr=noise_map, color=\"k\", ecolor=\"k\", elinewidth=1, capsize=2\n",
")\n",
"plt.plot(range(data.shape[0]), model_data, color=\"r\")\n",
"plt.plot(range(data.shape[0]), model_gaussian, \"--\")\n",
"plt.plot(range(data.shape[0]), model_exponential, \"--\")\n",
"plt.title(\"Dynesty model fit to 1D Gaussian + Exponential dataset.\")\n",
"plt.xlabel(\"x values of profile\")\n",
"plt.ylabel(\"Profile normalization\")\n",
"plt.show()\n",
"plt.close()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Cookbooks__\n",
"\n",
"This overview shows the basics of model-fitting with **PyAutoFit**.\n",
"\n",
"The API is designed to be intuitive and extensible, and you should have a good feeling for how you would define\n",
"and compose your own model, fit it to data with a chosen non-linear search, and use the results to interpret the\n",
"fit.\n",
"\n",
"The following cookbooks give a concise API reference for using **PyAutoFit**, and you should use them as you define\n",
"your own model to get a fit going:\n",
"\n",
" - Model Cookbook:\n",
" - Searches Cookbook:\n",
" - Analysis Cookbook:\n",
" - Results Cookbook:\n",
"\n",
"The next overview describes how to set up a scientific workflow, where many other tasks required to perform detailed but\n",
"scalable model-fitting can be delegated to **PyAutoFit**. "
]
},
{
"cell_type": "code",
"metadata": {},
"source": [],
"outputs": [],
"execution_count": null
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}