Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 698042a
  • /
  • notebooks
  • /
  • cookbooks
  • /
  • analysis.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:fae0cf81580daebf9a5a2171f29ac357a23c64c4
directory badge Iframe embedding
swh:1:dir:df4d23563fe364875a8e5909d01ddf7c6d28f735

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
analysis.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Cookbook: Analysis\n",
        "==================\n",
        "\n",
        "The `Analysis` class is the interface between the data and model, whereby a `log_likelihood_function` is defined\n",
        "and called by the non-linear search fitting to fit the model.\n",
        "\n",
        "This cookbook provides an overview of how to use and extend `Analysis` objects in **PyAutoFit**.\n",
        "\n",
        "__Contents__\n",
        "\n",
        " - Example: A simple example of an analysis class which can be adapted for you use-case.\n",
        " - Customization: Customizing an analysis class with different data inputs and editing the `log_likelihood_function`.\n",
        " - Visualization: Adding a `visualize` method to the analysis so that model-specific visuals are output to hard-disk.\n",
        " - Custom Output: Add methods which output model-specific results to hard-disk in the `files` folder (e.g. as .json\n",
        "   files) to aid in the interpretation of results."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%matplotlib inline\n",
        "from pyprojroot import here\n",
        "workspace_path = str(here())\n",
        "%cd $workspace_path\n",
        "print(f\"Working Directory has been set to `{workspace_path}`\")\n",
        "\n",
        "import json\n",
        "import numpy as np\n",
        "from os import path\n",
        "\n",
        "import autofit as af\n",
        "import autofit.plot as aplt"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Example__\n",
        "\n",
        "An example simple `Analysis` class, to remind ourselves of the basic structure and inputs.\n",
        "\n",
        "This can be adapted for your use case."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "\n",
        "class Analysis(af.Analysis):\n",
        "    def __init__(self, data: np.ndarray, noise_map: np.ndarray):\n",
        "        \"\"\"\n",
        "        The `Analysis` class acts as an interface between the data and model in **PyAutoFit**.\n",
        "\n",
        "        Its `log_likelihood_function` defines how the model is fitted to the data and it is called many times by\n",
        "        the non-linear search fitting algorithm.\n",
        "\n",
        "        In this example the `Analysis` `__init__` constructor only contains the `data` and `noise-map`, but it can be\n",
        "        easily extended to include other quantities.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        data\n",
        "            A 1D numpy array containing the data (e.g. a noisy 1D signal) fitted in the workspace examples.\n",
        "        noise_map\n",
        "            A 1D numpy array containing the noise values of the data, used for computing the goodness of fit\n",
        "            metric, the log likelihood.\n",
        "        \"\"\"\n",
        "        super().__init__()\n",
        "\n",
        "        self.data = data\n",
        "        self.noise_map = noise_map\n",
        "\n",
        "    def log_likelihood_function(self, instance) -> float:\n",
        "        \"\"\"\n",
        "        Returns the log likelihood of a fit of a 1D Gaussian to the dataset.\n",
        "\n",
        "        The data is fitted using an `instance` of the `Gaussian` class where its `model_data_1d_via_xvalues_from`\n",
        "        is called in order to create a model data representation of the Gaussian that is fitted to the data.\n",
        "        \"\"\"\n",
        "\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "\n",
        "        residual_map = self.data - model_data\n",
        "        chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
        "        chi_squared = sum(chi_squared_map)\n",
        "        noise_normalization = np.sum(np.log(2 * np.pi * self.noise_map**2.0))\n",
        "        log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
        "\n",
        "        return log_likelihood\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "An instance of the analysis class is created as follows."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "dataset_path = path.join(\"dataset\", \"example_1d\", \"gaussian_x1\")\n",
        "data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
        "noise_map = af.util.numpy_array_from_json(\n",
        "    file_path=path.join(dataset_path, \"noise_map.json\")\n",
        ")\n",
        "\n",
        "analysis = Analysis(data=data, noise_map=noise_map)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Customization__\n",
        "\n",
        "The `Analysis` class can be fully customized to be suitable for your model-fit.\n",
        "\n",
        "For example, additional inputs can be included in the `__init__` constructor and used in the `log_likelihood_function`.\n",
        "if they are required for your `log_likelihood_function` to work.\n",
        "\n",
        "The example below includes three additional inputs:\n",
        "\n",
        " - Instead of inputting a `noise_map`, a `noise_covariance_matrix` is input, which means that corrrlated noise is \n",
        "   accounted for in the `log_likelihood_function`.\n",
        " \n",
        " - A `mask` is input which masks the data such that certain data points are omitted from the log likelihood\n",
        " \n",
        " - A `kernel` is input which can account for certain blurring operations during data acquisition."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "\n",
        "class Analysis(af.Analysis):\n",
        "    def __init__(\n",
        "        self,\n",
        "        data: np.ndarray,\n",
        "        noise_covariance_matrix: np.ndarray,\n",
        "        mask: np.ndarray,\n",
        "        kernel: np.ndarray,\n",
        "    ):\n",
        "        \"\"\"\n",
        "        The `Analysis` class which has had its inputs edited for a different model-fit.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        data\n",
        "            A 1D numpy array containing the data (e.g. a noisy 1D signal) fitted in the workspace examples.\n",
        "        noise_covariance_matrix\n",
        "            A 2D numpy array containing the noise values and their covariances for the data, used for computing the\n",
        "            goodness of fit whilst accounting for correlated noise.\n",
        "        mask\n",
        "            A 1D numpy array containing a mask, where `True` values mean a data point is masked and is omitted from\n",
        "            the log likelihood.\n",
        "        kernel\n",
        "            A 1D numpy array containing the blurring kernel of the data, used for creating the model data.\n",
        "        \"\"\"\n",
        "        super().__init__()\n",
        "\n",
        "        self.data = data\n",
        "        self.noise_covariance_matrix = noise_covariance_matrix\n",
        "        self.mask = mask\n",
        "        self.kernel = kernel\n",
        "\n",
        "    def log_likelihood_function(self, instance) -> float:\n",
        "        \"\"\"\n",
        "        The `log_likelihood_function` now has access to the  `noise_covariance_matrix`, `mask` and `kernel`\n",
        "        input above.\n",
        "        \"\"\"\n",
        "        print(self.noise_covariance_matrix)\n",
        "        print(self.mask)\n",
        "        print(self.kernel)\n",
        "\n",
        "        \"\"\"\n",
        "        We do not provide a specific example of how to use these inputs in the `log_likelihood_function` as they are\n",
        "        specific to your model fitting problem.\n",
        "        \n",
        "        The key point is that any inputs required to compute the log likelihood can be passed into the `__init__`\n",
        "        constructor of the `Analysis` class and used in the `log_likelihood_function`.\n",
        "        \"\"\"\n",
        "\n",
        "        log_likelihood = None\n",
        "\n",
        "        return log_likelihood\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "An instance of the analysis class is created as follows."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "dataset_path = path.join(\"dataset\", \"example_1d\", \"gaussian_x1\")\n",
        "data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
        "\n",
        "noise_covariance_matrix = np.ones(shape=(data.shape[0], data.shape[0]))\n",
        "mask = np.full(fill_value=False, shape=data.shape)\n",
        "kernel = np.full(fill_value=1.0, shape=data.shape)\n",
        "\n",
        "analysis = Analysis(\n",
        "    data=data, noise_covariance_matrix=noise_covariance_matrix, mask=mask, kernel=kernel\n",
        ")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Visualization__\n",
        "\n",
        "If a `name` is input into a non-linear search, all results are output to hard-disk in a folder.\n",
        "\n",
        "By extending the `Analysis` class with a `visualize_before_fit` and / or `visualize` function, model specific \n",
        "visualization will also be output into an `image` folder, for example as `.png` files.\n",
        "\n",
        "This uses the maximum log likelihood model of the model-fit inferred so far.\n",
        "\n",
        "Visualization of the results of the search, such as the corner plot of what is called the \"Probability Density \n",
        "Function\", are also automatically output during the model-fit on the fly."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "\n",
        "class Analysis(af.Analysis):\n",
        "    def __init__(self, data, noise_map):\n",
        "        \"\"\"\n",
        "        We use the simpler Analysis class above for this example.\n",
        "        \"\"\"\n",
        "        super().__init__()\n",
        "\n",
        "        self.data = data\n",
        "        self.noise_map = noise_map\n",
        "\n",
        "    def log_likelihood_function(self, instance):\n",
        "        \"\"\"\n",
        "        The `log_likelihood_function` is identical to the example above\n",
        "        \"\"\"\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "        residual_map = self.data - model_data\n",
        "        chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
        "        chi_squared = sum(chi_squared_map)\n",
        "        noise_normalization = np.sum(np.log(2 * np.pi * noise_map**2.0))\n",
        "        log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
        "\n",
        "        return log_likelihood\n",
        "\n",
        "    def visualize_before_fit(\n",
        "        self, paths: af.DirectoryPaths, model: af.AbstractPriorModel\n",
        "    ):\n",
        "        \"\"\"\n",
        "        Before a model-fit, the `visualize_before_fit` method is called to perform visualization.\n",
        "\n",
        "        This can output visualization of quantities which do not change during the model-fit, for example the\n",
        "        data and noise-map.\n",
        "\n",
        "        The `paths` object contains the path to the folder where the visualization should be output, which is determined\n",
        "        by the non-linear search `name` and other inputs.\n",
        "        \"\"\"\n",
        "\n",
        "        import matplotlib.pyplot as plt\n",
        "\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        plt.errorbar(\n",
        "            x=xvalues,\n",
        "            y=self.data,\n",
        "            yerr=self.noise_map,\n",
        "            color=\"k\",\n",
        "            ecolor=\"k\",\n",
        "            elinewidth=1,\n",
        "            capsize=2,\n",
        "        )\n",
        "        plt.title(\"Maximum Likelihood Fit\")\n",
        "        plt.xlabel(\"x value of profile\")\n",
        "        plt.ylabel(\"Profile Normalization\")\n",
        "        plt.savefig(path.join(paths.image_path, f\"data.png\"))\n",
        "        plt.clf()\n",
        "\n",
        "    def visualize(self, paths: af.DirectoryPaths, instance, during_analysis):\n",
        "        \"\"\"\n",
        "        During a model-fit, the `visualize` method is called throughout the non-linear search.\n",
        "\n",
        "        The `instance` passed into the visualize method is maximum log likelihood solution obtained by the model-fit\n",
        "        so far and it can be used to provide on-the-fly images showing how the model-fit is going.\n",
        "\n",
        "        The `paths` object contains the path to the folder where the visualization should be output, which is determined\n",
        "        by the non-linear search `name` and other inputs.\n",
        "        \"\"\"\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "        residual_map = self.data - model_data\n",
        "\n",
        "        \"\"\"\n",
        "        The visualizer now outputs images of the best-fit results to hard-disk (checkout `visualizer.py`).\n",
        "        \"\"\"\n",
        "        import matplotlib.pyplot as plt\n",
        "\n",
        "        plt.errorbar(\n",
        "            x=xvalues,\n",
        "            y=self.data,\n",
        "            yerr=self.noise_map,\n",
        "            color=\"k\",\n",
        "            ecolor=\"k\",\n",
        "            elinewidth=1,\n",
        "            capsize=2,\n",
        "        )\n",
        "        plt.plot(xvalues, model_data, color=\"r\")\n",
        "        plt.title(\"Maximum Likelihood Fit\")\n",
        "        plt.xlabel(\"x value of profile\")\n",
        "        plt.ylabel(\"Profile Normalization\")\n",
        "        plt.savefig(path.join(paths.image_path, f\"model_fit.png\"))\n",
        "        plt.clf()\n",
        "\n",
        "        plt.errorbar(\n",
        "            x=xvalues,\n",
        "            y=residual_map,\n",
        "            yerr=self.noise_map,\n",
        "            color=\"k\",\n",
        "            ecolor=\"k\",\n",
        "            elinewidth=1,\n",
        "            capsize=2,\n",
        "        )\n",
        "        plt.title(\"Residuals of Maximum Likelihood Fit\")\n",
        "        plt.xlabel(\"x value of profile\")\n",
        "        plt.ylabel(\"Residual\")\n",
        "        plt.savefig(path.join(paths.image_path, f\"model_fit.png\"))\n",
        "        plt.clf()\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Custom Output__\n",
        "\n",
        "When performing fits which output results to hard-disc, a `files` folder is created containing .json / .csv files of \n",
        "the model, samples, search, etc.\n",
        "\n",
        "These files are human readable and help one quickly inspect and interpret results. \n",
        "\n",
        "By extending an `Analysis` class with the methods `save_attributes_for_aggregator` and `save_results_for_aggregator`, \n",
        "custom files can be written to the `files` folder to further aid this inspection. \n",
        "\n",
        "These files can then also be loaded via the database, as described in the database cookbook."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "\n",
        "class Analysis(af.Analysis):\n",
        "    def __init__(self, data: np.ndarray, noise_map: np.ndarray):\n",
        "        \"\"\"\n",
        "        Standard Analysis class example used throughout PyAutoFit examples.\n",
        "        \"\"\"\n",
        "        super().__init__()\n",
        "\n",
        "        self.data = data\n",
        "        self.noise_map = noise_map\n",
        "\n",
        "    def log_likelihood_function(self, instance) -> float:\n",
        "        \"\"\"\n",
        "        Standard log likelihood function used throughout PyAutoFit examples.\n",
        "        \"\"\"\n",
        "\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "\n",
        "        residual_map = self.data - model_data\n",
        "        chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
        "        chi_squared = sum(chi_squared_map)\n",
        "        noise_normalization = np.sum(np.log(2 * np.pi * self.noise_map**2.0))\n",
        "        log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
        "\n",
        "        return log_likelihood\n",
        "\n",
        "    def save_attributes_for_aggregator(self, paths: af.DirectoryPaths):\n",
        "        \"\"\"\n",
        "        Before the non-linear search begins, this routine saves attributes of the `Analysis` object to the `files`\n",
        "        folder such that they can be loaded after the analysis using PyAutoFit's database and aggregator tools.\n",
        "\n",
        "        For this analysis, it uses the `AnalysisDataset` object's method to output the following:\n",
        "\n",
        "        - The dataset's data as a .json file.\n",
        "        - The dataset's noise-map as a .json file.\n",
        "\n",
        "        These are accessed using the aggregator via `agg.values(\"data\")` and `agg.values(\"noise_map\")`.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        paths\n",
        "            The PyAutoFit paths object which manages all paths, e.g. where the non-linear search outputs are stored,\n",
        "            visualization, and the pickled objects used by the aggregator output by this function.\n",
        "        \"\"\"\n",
        "        # The path where data.json is saved, e.g. output/dataset_name/unique_id/files/data.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"data.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(self.data, f, indent=4)\n",
        "\n",
        "        # The path where noise_map.json is saved, e.g. output/noise_mapset_name/unique_id/files/noise_map.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"noise_map.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(self.noise_map, f, indent=4)\n",
        "\n",
        "    def save_results_for_aggregator(self, paths: af.DirectoryPaths, result: af.Result):\n",
        "        \"\"\"\n",
        "        At the end of a model-fit,  this routine saves attributes of the `Analysis` object to the `files`\n",
        "        folder such that they can be loaded after the analysis using PyAutoFit's database and aggregator tools.\n",
        "\n",
        "        For this analysis it outputs the following:\n",
        "\n",
        "        - The maximum log likelihood model data as a .json file.\n",
        "\n",
        "        This is accessed using the aggregator via `agg.values(\"model_data\")`.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        paths\n",
        "            The PyAutoFit paths object which manages all paths, e.g. where the non-linear search outputs are stored,\n",
        "            visualization and the pickled objects used by the aggregator output by this function.\n",
        "        result\n",
        "            The result of a model fit, including the non-linear search, samples and maximum likelihood model.\n",
        "        \"\"\"\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        instance = result.max_log_likelihood_instance\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "\n",
        "        # The path where model_data.json is saved, e.g. output/dataset_name/unique_id/files/model_data.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"model_data.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(model_data, f, indent=4)\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finish."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [],
      "outputs": [],
      "execution_count": null
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.1"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}

back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API