Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 698042a
  • /
  • notebooks
  • /
  • cookbooks
  • /
  • multiple_datasets.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:aef3e4ca941699f6572df217d7a407dc26b169b2
directory badge Iframe embedding
swh:1:dir:df4d23563fe364875a8e5909d01ddf7c6d28f735

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
multiple_datasets.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Cookbook: Multiple Datasets\n",
        "===========================\n",
        "\n",
        "This cookbook illustrates how to fit multiple datasets simultaneously, where each dataset is fitted by a different\n",
        "`Analysis` class.\n",
        "\n",
        "The `Analysis` classes are summed together to give an overall log likelihood function that is the sum of the\n",
        "individual log likelihood functions, which is sampled by the non-linear search.\n",
        "\n",
        "If one has multiple observations of the same signal, it is often desirable to fit them simultaneously. This ensures\n",
        "that better constraints are placed on the model, as the full amount of information in the datasets is used.\n",
        "\n",
        "In some scenarios, the signal may vary across the datasets in a way that requires that the model is updated\n",
        "accordingly. **PyAutoFit** provides tools to customize the model composition such that specific parameters of the model\n",
        "vary across the datasets.\n",
        "\n",
        "This cookbook illustrates using observations of 3 1D Gaussians, which have the same `centre` (which is the same\n",
        "for the model fitted to each dataset) but different `normalization and `sigma` values (which vary for the model\n",
        "fitted to each dataset).\n",
        "\n",
        "It is common for each individual dataset to only constrain specific aspects of a model. The high level of model\n",
        "customizaiton provided by **PyAutoFit** ensures that composing a model that is appropriate for fitting large and diverse\n",
        "datasets is straight forward. This is because different `Analysis` classes can be written for each dataset and summed.\n",
        "\n",
        "__Contents__\n",
        "\n",
        " - Model-Fit: Setup a model-fit to 3 datasets to illustrate multi-dataset fitting.\n",
        " - Analysis Summing: Sum multiple `Analysis` classes to create a single `Analysis` class that fits all 3 datasets\n",
        "   simultaneously, including summing their individual log likelihood functions.\n",
        " - Result List: Use the output of fits to multiple datasets which are a list of `Result` objects.\n",
        " - Variable Model Across Datasets: Fit a model where certain parameters vary across the datasets whereas others\n",
        "   stay fixed.\n",
        " - Relational Model: Fit models where certain parameters vary across the dataset as a user\n",
        "   defined relation (e.g. `y = mx + c`).\n",
        " - Different Analysis Classes: Fit multiple datasets where each dataset is fitted by a different `Analysis` class,\n",
        "   meaning that datasets with different formats can be fitted simultaneously.\n",
        " - Interpolation: Fit multiple datasets with a model one-by-one and interpolation over a smoothly varying parameter\n",
        "   (e.g. time) to infer the model between datasets.\n",
        " - Individual Sequential Searches: Fit multiple datasets where each dataset is fitted one-by-one sequentially.\n",
        " - Hierarchical / Graphical Models: Use hierarchical / graphical models to fit multiple datasets simultaneously,\n",
        "   which fit for global trends in the model across the datasets."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%matplotlib inline\n",
        "from pyprojroot import here\n",
        "workspace_path = str(here())\n",
        "%cd $workspace_path\n",
        "print(f\"Working Directory has been set to `{workspace_path}`\")\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "from os import path\n",
        "\n",
        "import autofit as af\n",
        "import autofit.plot as aplt"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Model Fit__\n",
        "\n",
        "Load 3 1D Gaussian datasets from .json files in the directory `autofit_workspace/dataset/`.\n",
        "\n",
        "All three datasets contain an identical signal, therefore fitting the same model to all three datasets simultaneously\n",
        "is appropriate.\n",
        "\n",
        "Each dataset has a different noise realization, therefore fitting them simultaneously will offer improved constraints \n",
        "over individual fits.\n",
        "\n",
        "__Example Source Code (`af.ex`)__\n",
        "\n",
        "The **PyAutoFit** source code has the following example objects (accessed via `af.ex`) used in this tutorial:\n",
        "\n",
        " - `Analysis`: an analysis object which fits noisy 1D datasets, including `log_likelihood_function` and\n",
        " `visualize` functions.\n",
        "\n",
        " - `Gaussian`: a model component representing a 1D Gaussian profile.\n",
        "\n",
        "These are functionally identical to the `Analysis` and `Gaussian` objects you have seen elsewhere in the workspace."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "dataset_size = 3\n",
        "\n",
        "data_list = []\n",
        "noise_map_list = []\n",
        "\n",
        "for dataset_index in range(dataset_size):\n",
        "    dataset_path = path.join(\n",
        "        \"dataset\", \"example_1d\", f\"gaussian_x1_identical_{dataset_index}\"\n",
        "    )\n",
        "\n",
        "    data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
        "    data_list.append(data)\n",
        "\n",
        "    noise_map = af.util.numpy_array_from_json(\n",
        "        file_path=path.join(dataset_path, \"noise_map.json\")\n",
        "    )\n",
        "    noise_map_list.append(noise_map)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Plot all 3 datasets, including their error bars. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "for data, noise_map in zip(data_list, noise_map_list):\n",
        "    xvalues = range(data.shape[0])\n",
        "\n",
        "    plt.errorbar(\n",
        "        x=xvalues,\n",
        "        y=data,\n",
        "        yerr=noise_map,\n",
        "        color=\"k\",\n",
        "        ecolor=\"k\",\n",
        "        linestyle=\" \",\n",
        "        elinewidth=1,\n",
        "        capsize=2,\n",
        "    )\n",
        "    plt.show()\n",
        "    plt.close()"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create our model corresponding to a single 1D Gaussian that is fitted to all 3 datasets simultaneously."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "model = af.Model(af.ex.Gaussian)\n",
        "\n",
        "model.centre = af.UniformPrior(lower_limit=0.0, upper_limit=100.0)\n",
        "model.normalization = af.LogUniformPrior(lower_limit=1e-2, upper_limit=1e2)\n",
        "model.sigma = af.GaussianPrior(\n",
        "    mean=10.0, sigma=5.0, lower_limit=0.0, upper_limit=np.inf\n",
        ")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Analysis Summing__\n",
        "\n",
        "Set up three instances of the `Analysis` class which fit 1D Gaussian.\n",
        "\n",
        "We set up an `Analysis` for each dataset one-by-one, using a for loop."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis_list = []\n",
        "\n",
        "for data, noise_map in zip(data_list, noise_map_list):\n",
        "    analysis = af.ex.Analysis(data=data, noise_map=noise_map)\n",
        "    analysis_list.append(analysis)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now sum together every analysis in the list, to produce an overall analysis class which we fit with the non-linear\n",
        "search.\n",
        "\n",
        "By summing analysis objects the following happen:\n",
        "\n",
        " - The log likelihood values computed by the `log_likelihood_function` of each individual analysis class are summed to\n",
        "   give an overall log likelihood value that the non-linear search samples when model-fitting.\n",
        "\n",
        " - The output path structure of the results goes to a single folder, which includes sub-folders for the visualization\n",
        "   of every individual analysis object based on the `Analysis` object's `visualize` method."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis = analysis_list[0] + analysis_list[1] + analysis_list[2]"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can alternatively sum a list of analysis objects as follows:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis = sum(analysis_list)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `log_likelihood_function`'s can be called in parallel over multiple cores by changing the `n_cores` parameter.\n",
        "\n",
        "This is beneficial when the model-fitting procedure is slow and the likelihood evaluation time of the different\n",
        "is roughly consistent."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis.n_cores = 1"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To fit multiple datasets via a non-linear search we use this summed analysis object:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "search = af.DynestyStatic(\n",
        "    path_prefix=\"features\", sample=\"rwalk\", name=\"multiple_datasets_simple\"\n",
        ")\n",
        "\n",
        "result_list = search.fit(model=model, analysis=analysis)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Result List__\n",
        "\n",
        "The result object returned by the fit is a list of the `Result` objects, which is described in the result cookbook.\n",
        "\n",
        "Each `Result` in the list corresponds to each `Analysis` object in the `analysis_list` we passed to the fit.\n",
        "\n",
        "The same model was fitted across all analyses, thus every `Result` in the `result_list` contains the same information \n",
        "on the samples and the same `max_log_likelihood_instance`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "print(result_list[0].max_log_likelihood_instance.centre)\n",
        "print(result_list[0].max_log_likelihood_instance.normalization)\n",
        "print(result_list[0].max_log_likelihood_instance.sigma)\n",
        "\n",
        "print(result_list[1].max_log_likelihood_instance.centre)\n",
        "print(result_list[1].max_log_likelihood_instance.normalization)\n",
        "print(result_list[1].max_log_likelihood_instance.sigma)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can plot the model-fit to each dataset by iterating over the results:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "for data, result in zip(data_list, result_list):\n",
        "    instance = result.max_log_likelihood_instance\n",
        "\n",
        "    model_data = instance.model_data_1d_via_xvalues_from(\n",
        "        xvalues=np.arange(data.shape[0])\n",
        "    )\n",
        "\n",
        "    plt.errorbar(\n",
        "        x=xvalues,\n",
        "        y=data,\n",
        "        yerr=noise_map,\n",
        "        color=\"k\",\n",
        "        ecolor=\"k\",\n",
        "        elinewidth=1,\n",
        "        capsize=2,\n",
        "    )\n",
        "    plt.plot(xvalues, model_data, color=\"r\")\n",
        "    plt.title(\"Dynesty model fit to 1D Gaussian dataset.\")\n",
        "    plt.xlabel(\"x values of profile\")\n",
        "    plt.ylabel(\"Profile normalization\")\n",
        "    plt.show()\n",
        "    plt.close()"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Variable Model Across Datasets__\n",
        "\n",
        "The same model was fitted to every dataset simultaneously because all 3 datasets contained an identical signal with \n",
        "only the noise varying across the datasets.\n",
        "\n",
        "If the signal varied across the datasets, we would instead want to fit a different model to each dataset. The model\n",
        "composition can be updated using the summed `Analysis` object to do this.\n",
        "\n",
        "We will use an example of 3 1D Gaussians which have the same `centre` but the `normalization` and `sigma` vary across \n",
        "datasets:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "dataset_path = path.join(\"dataset\", \"example_1d\", \"gaussian_x1_variable\")\n",
        "\n",
        "dataset_name_list = [\"sigma_0\", \"sigma_1\", \"sigma_2\"]\n",
        "\n",
        "data_list = []\n",
        "noise_map_list = []\n",
        "\n",
        "for dataset_name in dataset_name_list:\n",
        "    dataset_time_path = path.join(dataset_path, dataset_name)\n",
        "\n",
        "    data = af.util.numpy_array_from_json(\n",
        "        file_path=path.join(dataset_time_path, \"data.json\")\n",
        "    )\n",
        "    noise_map = af.util.numpy_array_from_json(\n",
        "        file_path=path.join(dataset_time_path, \"noise_map.json\")\n",
        "    )\n",
        "\n",
        "    data_list.append(data)\n",
        "    noise_map_list.append(noise_map)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Plotting these datasets shows that the `normalization` and` `sigma` of each Gaussian vary."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "for data, noise_map in zip(data_list, noise_map_list):\n",
        "    xvalues = range(data.shape[0])\n",
        "\n",
        "    af.ex.plot_profile_1d(xvalues=xvalues, profile_1d=data)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `centre` of all three 1D Gaussians are the same in each dataset, but their `normalization` and `sigma` values \n",
        "are decreasing.\n",
        "\n",
        "We will therefore fit a model to all three datasets simultaneously, whose `centre` is the same for all 3 datasets but\n",
        "the `normalization` and `sigma` vary.\n",
        "\n",
        "To do that, we use a summed list of `Analysis` objects, where each `Analysis` object contains a different dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis_list = []\n",
        "\n",
        "for data, noise_map in zip(data_list, noise_map_list):\n",
        "    analysis = af.ex.Analysis(data=data, noise_map=noise_map)\n",
        "    analysis_list.append(analysis)\n",
        "\n",
        "analysis = sum(analysis_list)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We next compose a model of a 1D Gaussian."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "model = af.Collection(gaussian=af.Model(af.ex.Gaussian))"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now update the model using the summed `Analysis `objects to compose a model where: \n",
        "\n",
        " - The `centre` values of the Gaussian fitted to every dataset in every `Analysis` object are identical. \n",
        "\n",
        " - The`normalization` and `sigma` value of the every Gaussian fitted to every dataset in every `Analysis` object \n",
        "   are different.\n",
        "\n",
        "The model has 7 free parameters in total, x1 shared `centre`, x3 unique `normalization`'s and x3 unique `sigma`'s."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis = analysis.with_free_parameters(\n",
        "    model.gaussian.normalization, model.gaussian.sigma\n",
        ")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To inspect this new model, with extra parameters for each dataset created, we extract a modified version of this \n",
        "model from the summed `Analysis` object.\n",
        "\n",
        "This model modiciation occurs automatically when a non-linear search begins, therefore the normal model we created \n",
        "above is input to the `search.fit()` method."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "model_updated = analysis.modify_model(model)\n",
        "\n",
        "print(model_updated.info)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Fit this model to the data using dynesty."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "search = af.DynestyStatic(\n",
        "    path_prefix=\"features\", sample=\"rwalk\", name=\"multiple_datasets_free_sigma\"\n",
        ")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `normalization` and `sigma` values of the maximum log likelihood models fitted to each dataset are different, \n",
        "which is shown by printing the `sigma` values of the maximum log likelihood instances of each result.\n",
        "\n",
        "The `centre` values of the maximum log likelihood models fitted to each dataset are the same."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "result_list = search.fit(model=model, analysis=analysis)\n",
        "\n",
        "for result in result_list:\n",
        "    instance = result.max_log_likelihood_instance\n",
        "\n",
        "    print(\"Max Log Likelihood Model:\")\n",
        "    print(\"Centre = \", instance.gaussian.centre)\n",
        "    print(\"Normalization = \", instance.gaussian.normalization)\n",
        "    print(\"Sigma = \", instance.gaussian.sigma)\n",
        "    print()\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Relational Model__\n",
        "\n",
        "In the model above, two extra free parameters (`normalization and `sigma`) were added for every dataset. \n",
        "\n",
        "For just 3 datasets the model stays low dimensional and this is not a problem. However, for 30+ datasets the model\n",
        "will become complex and difficult to fit.\n",
        "\n",
        "In these circumstances, one can instead compose a model where the parameters vary smoothly across the datasets\n",
        "via a user defined relation.\n",
        "\n",
        "Below, we compose a model where the `sigma` value fitted to each dataset is computed according to:\n",
        "\n",
        " `y = m * x + c` : `sigma` = sigma_m * x + sigma_c`\n",
        "\n",
        "Where x is an integer number specifying the index of the dataset (e.g. 1, 2 and 3).\n",
        "\n",
        "By defining a relation of this form, `sigma_m` and `sigma_c` are the only free parameters of the model which vary\n",
        "across the datasets. \n",
        "\n",
        "Of more datasets are added the number of model parameters therefore does not increase."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "normalization_m = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "normalization_c = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "\n",
        "sigma_m = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "sigma_c = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "\n",
        "x_list = [1.0, 2.0, 3.0]\n",
        "\n",
        "analysis_with_relation_list = []\n",
        "\n",
        "for x, analysis in zip(x_list, analysis_list):\n",
        "    normalization_relation = (normalization_m * x) + normalization_c\n",
        "    sigma_relation = (sigma_m * x) + sigma_c\n",
        "\n",
        "    analysis_with_relation = analysis.with_model(\n",
        "        model.replacing(\n",
        "            {\n",
        "                model.gaussian.normalization: normalization_relation,\n",
        "                model.gaussian.sigma: sigma_relation,\n",
        "            }\n",
        "        ),\n",
        "    )\n",
        "\n",
        "    analysis_with_relation_list.append(analysis_with_relation)\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can use division, subtraction and logorithms to create more complex relations and apply them to different parameters, \n",
        "for example:\n",
        "\n",
        " `y = m * log10(x) - log(z) + c` : `sigma` = sigma_m * log10(x) - log(z) + sigma_c` \n",
        " `y = m * (x / z)` : `centre` = centre_m * (x / z)`"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "model = af.Collection(gaussian=af.Model(af.ex.Gaussian))\n",
        "\n",
        "sigma_m = af.UniformPrior(lower_limit=-0.1, upper_limit=0.1)\n",
        "sigma_c = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "\n",
        "centre_m = af.UniformPrior(lower_limit=-0.1, upper_limit=0.1)\n",
        "centre_c = af.UniformPrior(lower_limit=-10.0, upper_limit=10.0)\n",
        "\n",
        "x_list = [1.0, 10.0, 30.0]\n",
        "z_list = [2.0, 4.0, 6.0]\n",
        "\n",
        "analysis_with_relation_list = []\n",
        "\n",
        "for x, z, analysis in zip(x_list, z_list, analysis_list):\n",
        "    sigma_relation = (sigma_m * af.Log10(x) - af.Log(z)) + sigma_c\n",
        "    centre_relation = centre_m * (x / z)\n",
        "\n",
        "    analysis_with_relation = analysis.with_model(\n",
        "        model.replacing(\n",
        "            {\n",
        "                model.gaussian.sigma: sigma_relation,\n",
        "                model.gaussian.centre: centre_relation,\n",
        "            }\n",
        "        )\n",
        "    )\n",
        "\n",
        "    analysis_with_relation_list.append(analysis_with_relation)\n",
        "\n",
        "analysis_with_relation = sum(analysis_with_relation_list)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Analysis summing is performed after the model relations have been created."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "analysis_with_relation = sum(analysis_with_relation_list)\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The modified model's `info` attribute shows the model has been composed using this relation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "model_updated = analysis_with_relation.modify_model(model)\n",
        "\n",
        "print(model_updated.info)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can fit the model as per usual."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "search = af.DynestyStatic(\n",
        "    path_prefix=\"features\", sample=\"rwalk\", name=\"multiple_datasets_relation\"\n",
        ")\n",
        "\n",
        "# result_list = search.fit(model=model, analysis=analysis_with_relation)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `normalization` and `sigma` values of the maximum log likelihood models fitted to each dataset are different, \n",
        "which is shown by printing the `sigma` values of the maximum log likelihood instances of each result.\n",
        "\n",
        "They now follow the relation we defined above.\n",
        "\n",
        "The `centre` values of the maximum log likelihood models fitted to each dataset are the same."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# for result in result_list:\n",
        "#     instance = result.max_log_likelihood_instance\n",
        "#\n",
        "#     print(\"Max Log Likelihood Model:\")\n",
        "#     print(\"Centre = \", instance.gaussian.centre)\n",
        "#     print(\"Normalization = \", instance.gaussian.normalization)\n",
        "#     print(\"Sigma = \", instance.gaussian.sigma)\n",
        "#     print()"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Different Analysis Objects__\n",
        "\n",
        "For simplicity, this example summed together a single `Analysis` class which fitted 1D Gaussian's to 1D data.\n",
        "\n",
        "For many problems one may have multiple datasets which are quite different in their format and structure In this \n",
        "situation, one can simply define unique `Analysis` objects for each type of dataset, which will contain a \n",
        "unique `log_likelihood_function` and methods for visualization.\n",
        "\n",
        "The analysis summing API illustrated here can then be used to fit this large variety of datasets, noting that the \n",
        "the model can also be customized as necessary for fitting models to multiple datasets that are different in their \n",
        "format and structure. \n",
        "\n",
        "__Interpolation__\n",
        "\n",
        "One may have many datasets which vary according to a smooth function, for example a dataset taken over time where\n",
        "the signal varies smoothly as a function of time.\n",
        "\n",
        "This could be fitted using the tools above, all at once. However, in many use cases this is not possible due to the\n",
        "model complexity, number of datasets or computational time.\n",
        "\n",
        "An alternative approach is to fit each dataset individually, and then interpolate the results over the smoothly\n",
        "varying parameter (e.g. time) to estimate the model parameters at any point.\n",
        "\n",
        "**PyAutoFit** has interpolation tools to do exactly this. These have not been documented yet, but if they sound\n",
        "useful to you please contact us on SLACK and we'll be happy to explain how they work.\n",
        "\n",
        "__Individual Sequential Searches__\n",
        "\n",
        "The API above is used to create a model with free parameters across ``Analysis`` objects, which are all fit\n",
        "simultaneously using a summed ``log_likelihood_function`` and single non-linear search.\n",
        "\n",
        "Each ``Analysis`` can be fitted one-by-one, using a series of multiple non-linear searches, using\n",
        "the ``fit_sequential`` method."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "search = af.DynestyStatic(\n",
        "    path_prefix=\"features\",\n",
        "    sample=\"rwalk\",\n",
        "    name=\"multiple_datasets_free_sigma__sequential\",\n",
        ")\n",
        "\n",
        "analysis = sum(analysis_list)\n",
        "\n",
        "result_list = search.fit_sequential(model=model, analysis=analysis)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The benefit of this method is for complex high dimensionality models (e.g. when many parameters are passed\n",
        "to `` analysis.with_free_parameters``, it breaks the fit down into a series of lower dimensionality non-linear\n",
        "searches that may convergence on a solution more reliably.\n",
        "\n",
        "__Hierarchical / Graphical Models__\n",
        "\n",
        "A common class of models used for fitting complex models to large datasets are hierarchical and graphical models. \n",
        "\n",
        "These models can include addition parameters not specific to individual datasets describing the overall \n",
        "relationship between different model components, thus allowing one to infer the global trends contained within a \n",
        "dataset.\n",
        "\n",
        "**PyAutoFit** has a dedicated feature set for fitting hierarchical and graphical models and interested readers should\n",
        "checkout the hierarchical and graphical modeling \n",
        "chapter of **HowToFit** (https://pyautofit.readthedocs.io/en/latest/howtofit/chapter_graphical_models.html)\n",
        "\n",
        "__Wrap Up__\n",
        "\n",
        "We have shown how **PyAutoFit** can fit large datasets simultaneously, using custom models that vary specific\n",
        "parameters across the dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [],
      "outputs": [],
      "execution_count": null
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.1"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}

back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API