Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 698042a
  • /
  • notebooks
  • /
  • cookbooks
  • /
  • database.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:7066c2233c8a338c9dd1778ae74925ed92e545a3
directory badge Iframe embedding
swh:1:dir:df4d23563fe364875a8e5909d01ddf7c6d28f735

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
database.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Cookbook: Database\n",
        "==================\n",
        "\n",
        "The default behaviour of model-fitting results output is to be written to hard-disc in folders. These are simple to \n",
        "navigate and manually check. \n",
        "\n",
        "For small model-fitting tasks this is sufficient, however it does not scale well when performing many model fits to \n",
        "large datasets, because manual inspection of results becomes time consuming.\n",
        "\n",
        "All results can therefore be output to an sqlite3 (https://docs.python.org/3/library/sqlite3.html) relational database,\n",
        "meaning that results can be loaded into a Jupyter notebook or Python script for inspection, analysis and interpretation. \n",
        "This database supports advanced querying, so that specific model-fits (e.g., which fit a certain model or dataset) can \n",
        "be loaded.\n",
        "\n",
        "This cookbook provides a concise reference to the database API.\n",
        "\n",
        "__Contents__\n",
        "\n",
        "Ann overview of database functionality is given in the following sections:\n",
        "\n",
        " - Unique Identifiers: How unique identifiers are used to ensure every entry of the database is unique.\n",
        " - Info: Passing an `info` dictionary to the search to include information on the model-fit that is not part of the\n",
        "   model-fit itself, which can be loaded via the database.\n",
        " - Session: Set up a database session so results are written directly to the .sqlite database.\n",
        " - Building Database via Directory: Build a database from results already written to hard-disk in an output folder.\n",
        " - Files: The files that are stored in the database that can be loaded and inspected.\n",
        " - Generators: Why the database uses Python generators to load results.\n",
        "\n",
        "The results that can be loaded via the database are described in the following sections:\n",
        "\n",
        " - Model: The model fitted by the non-linear search.\n",
        " - Search: The search used to perform the model-fit.\n",
        " - Samples: The samples of the non-linear search (e.g. all parameter values, log likelihoods, etc.).\n",
        " - Samples Summary: A summary of the samples of the non-linear search (e.g. the maximum log likelihood model) which can\n",
        "   be faster to load than the full set of samples.\n",
        " - Info: The `info` dictionary passed to the search.\n",
        " - Custom Output: Extend `Analysis` classes to output additional information which can be loaded via the database (e.g.\n",
        "   the data, maximum likelihood model data, etc.).\n",
        "\n",
        "Using queries to load specific results is described in the following sections:\n",
        "\n",
        " - Querying Datasets: Query based on the name of the dataset.\n",
        " - Querying Searches: Query based on the name of the search.\n",
        " - Querying Models: Query based on the model that is fitted.\n",
        " - Querying Results: Query based on the results of the model-fit.\n",
        " - Querying Logic: Use logic to combine queries to load specific results (e.g. AND, OR, etc.)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%matplotlib inline\n",
        "from pyprojroot import here\n",
        "workspace_path = str(here())\n",
        "%cd $workspace_path\n",
        "print(f\"Working Directory has been set to `{workspace_path}`\")\n",
        "\n",
        "import json\n",
        "from os import path\n",
        "import numpy as np\n",
        "\n",
        "import autofit as af\n",
        "import autofit.plot as aplt"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Unique Identifiers__\n",
        "\n",
        "Results output to hard-disk by **PyAutoFit** are contained in a folder named via a unique identifier (a \n",
        "random collection of characters, e.g. `8hds89fhndlsiuhnfiusdh`). The unique identifier changes if the model or \n",
        "search change, to ensure different fits to not overwrite one another on hard-disk.\n",
        "\n",
        "Each unique identifier is used to define every entry of the database as it is built. Unique identifiers therefore play \n",
        "the same vital role for the database of ensuring that every set of results written to it are unique.\n",
        "\n",
        "In this example, we fit 3 different datasets with the same search and model. Each `dataset_name` is therefore passed\n",
        "in as the search's `unique_tag` to ensure 3 separate sets of results for each model-fit are written to the .sqlite\n",
        "database.\n",
        "\n",
        "__Info__\n",
        "\n",
        "Information about the model-fit that is not part included in the model-fit itself can be made accessible via the \n",
        "database by passing an `info` dictionary. \n",
        "\n",
        "Below we write info on the dataset`s (hypothetical) data of observation and exposure time, which we will later show\n",
        "the database can access. \n",
        "\n",
        "For fits to large datasets this ensures that all relevant information for interpreting results is accessible."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "info = {\"date_of_observation\": \"01-02-18\", \"exposure_time\": 1000.0}"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Session__\n",
        "\n",
        "We now perform a simple model-fit to 3 datasets, where the results are written directly a sqlite3 database.\n",
        "\n",
        "To do this, we start a session, which includes the name of the database `.sqlite` file where results are stored."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "session = af.db.open_database(\"database.sqlite\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For each dataset we load it from hard-disc, set up a model and analysis and fit it with a non-linear search. \n",
        "\n",
        "Note how the `session` is passed to the `Dynesty` search."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "dataset_name_list = [\"gaussian_x1_0\", \"gaussian_x1_1\", \"gaussian_x1_2\"]\n",
        "\n",
        "model = af.Collection(gaussian=af.ex.Gaussian)\n",
        "\n",
        "model.gaussian.centre = af.UniformPrior(lower_limit=0.0, upper_limit=100.0)\n",
        "model.gaussian.normalization = af.LogUniformPrior(lower_limit=1e-2, upper_limit=1e2)\n",
        "model.gaussian.sigma = af.GaussianPrior(\n",
        "    mean=10.0, sigma=5.0, lower_limit=0.0, upper_limit=np.inf\n",
        ")\n",
        "\n",
        "for dataset_name in dataset_name_list:\n",
        "    dataset_path = path.join(\"dataset\", \"example_1d\", dataset_name)\n",
        "\n",
        "    data = af.util.numpy_array_from_json(file_path=path.join(dataset_path, \"data.json\"))\n",
        "    noise_map = af.util.numpy_array_from_json(\n",
        "        file_path=path.join(dataset_path, \"noise_map.json\")\n",
        "    )\n",
        "\n",
        "    analysis = af.ex.Analysis(data=data, noise_map=noise_map)\n",
        "\n",
        "    search = af.DynestyStatic(\n",
        "        name=\"database_example\",\n",
        "        path_prefix=path.join(\"features\", \"database\"),\n",
        "        unique_tag=dataset_name,  # This makes the unique identifier use the dataset name\n",
        "        session=session,  # This instructs the search to write to the .sqlite database.\n",
        "        nlive=50,\n",
        "    )\n",
        "\n",
        "    print(\n",
        "        \"\"\"\n",
        "        The non-linear search has begun running. \n",
        "        This Jupyter notebook cell with progress once search has completed, this could take a few minutes!\n",
        "        \"\"\"\n",
        "    )\n",
        "\n",
        "    result = search.fit(model=model, analysis=analysis, info=info)\n",
        "\n",
        "print(\"Search has finished run - you may now continue the notebook.\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you inspect the `output` folder, you will see a `database.sqlite` file which contains the results.\n",
        "\n",
        "We can load the database using the `Aggregator`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "agg = af.Aggregator.from_database(\"database.sqlite\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Building Database via Directory__\n",
        "\n",
        "The fits above directly wrote the results to the .sqlite file, which we loaded above. However, you may have results\n",
        "already written to hard-disk in an output folder, which you wish to build your .sqlite file from.\n",
        "\n",
        "This can be done via the following code, which is commented out below to avoid us deleting the existing .sqlite file.\n",
        "\n",
        "Below, the `database_name` corresponds to the name of your output folder and is also the name of the `.sqlite` file\n",
        "that is created.\n",
        "\n",
        "If you are fitting a relatively small number of datasets (e.g. 10-100) having all results written\n",
        "to hard-disk (e.g. for quick visual inspection) but using the database for sample-wide analysis may be benefitial."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# database_name = \"database\"\n",
        "\n",
        "# agg = af.Aggregator.from_database(\n",
        "#    filename=f\"{database_name}.sqlite\", completed_only=False\n",
        "# )\n",
        "\n",
        "# agg.add_directory(directory=path.join(\"output\", database_name)))"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Files__\n",
        "\n",
        "When performing fits which output results to hard-disc, a `files` folder is created containing .json / .csv files of \n",
        "the model, samples, search, etc.\n",
        "\n",
        "These are the files that are written to the database, and the aggregator load them via the database in order\n",
        "to make them accessible in a Python script or Jupyter notebook.\n",
        "\n",
        "Below, we will access these results using the aggregator's `values` method. A full list of what can be loaded is\n",
        "as follows:\n",
        "\n",
        " - model: The `model` defined above and used in the model-fit (`model.json`).\n",
        " - search: The non-linear search settings of the fit (`search.json`).\n",
        " - samples: The non-linear search samples of the fit (`samples.csv`).\n",
        " - samples_summary: A summary of the samples results of the fit (`samples_summary.json`).\n",
        " - info: The info dictionary passed to the search (`info.json`).\n",
        " - covariance: The covariance matrix of the fit (`covariance.csv`).\n",
        " \n",
        "The `samples` and `samples_summary` results contain a lot of repeated information. The `samples` result contains\n",
        "the full non-linear search samples, for example every parameter sample and its log likelihood. The `samples_summary`\n",
        "contains a summary of the results, for example the maximum log likelihood model and error estimates on parameters\n",
        "at 1 and 3 sigma confidence.\n",
        "\n",
        "Accessing results via the `samples_summary` is therefore a lot faster, as it does reperform calculations using the\n",
        "full list of samples. Therefore, if the result you want is accessible via the `samples_summary` you should use it\n",
        "but if not you can revert to the `samples.\n",
        "\n",
        "__Generators__\n",
        "\n",
        "Before using the aggregator to inspect results, lets discuss Python generators. \n",
        "\n",
        "A generator is an object that iterates over a function when it is called. The aggregator creates all of the objects \n",
        "that it loads from the database as generators (as opposed to a list, or dictionary, or another Python type).\n",
        "\n",
        "This is because generators are memory efficient, as they do not store the entries of the database in memory \n",
        "simultaneously. This contrasts objects like lists and dictionaries, which store all entries in memory all at once. \n",
        "If you fit a large number of datasets, lists and dictionaries will use a lot of memory and could crash your computer!\n",
        "\n",
        "Once we use a generator in the Python code, it cannot be used again. To perform the same task twice, the \n",
        "generator must be remade it. This cookbook therefore rarely stores generators as variables and instead uses the \n",
        "aggregator to create each generator at the point of use.\n",
        "\n",
        "To create a generator of a specific set of results, we use the `values` method. This takes the `name` of the\n",
        "object we want to create a generator of, for example inputting `name=samples` will return the results `Samples`\n",
        "object."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "samples_gen = agg.values(\"samples\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "By converting this generator to a list and printing it, it is a list of 3 `SamplesDynesty` objects, corresponding to \n",
        "the 3 model-fits performed above."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "print(\"Dynesty Samples:\\n\")\n",
        "print(samples_gen)\n",
        "print(\"Total Samples Objects = \", len(agg), \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Model__\n",
        "\n",
        "The model used to perform the model fit for each of the 3 datasets can be loaded via the aggregator and printed."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "model_gen = agg.values(\"model\")\n",
        "\n",
        "for model in model_gen:\n",
        "    print(model.info)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Search__\n",
        "\n",
        "The non-linear search used to perform the model fit can be loaded via the aggregator and printed."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "search_gen = agg.values(\"search\")\n",
        "\n",
        "for search in search_gen:\n",
        "    print(search)"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Samples__\n",
        "\n",
        "The `Samples` class contains all information on the non-linear search samples, for example the value of every parameter\n",
        "sampled using the fit or an instance of the maximum likelihood model.\n",
        "\n",
        "The `Samples` class is described fully in the results cookbook."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "for samples in agg.values(\"samples\"):\n",
        "    print(\"The tenth sample`s third parameter\")\n",
        "    print(samples.parameter_lists[9][2], \"\\n\")\n",
        "\n",
        "    instance = samples.max_log_likelihood()\n",
        "\n",
        "    print(\"Max Log Likelihood `Gaussian` Instance:\")\n",
        "    print(\"Centre = \", instance.gaussian.centre)\n",
        "    print(\"Normalization = \", instance.gaussian.normalization)\n",
        "    print(\"Sigma = \", instance.gaussian.sigma, \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Samples Summary__\n",
        "\n",
        "The samples summary contains a subset of results access via the `Samples`, for example the maximum likelihood model\n",
        "and parameter error estimates.\n",
        "\n",
        "Using the samples method above can be slow, as the quantities have to be computed from all non-linear search samples\n",
        "(e.g. computing errors requires that all samples are marginalized over). This information is stored directly in the\n",
        "samples summary and can therefore be accessed instantly."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# for samples_summary in agg.values(\"samples_summary\"):\n",
        "#\n",
        "#     instance = samples_summary.max_log_likelihood()\n",
        "#\n",
        "#     print(\"Max Log Likelihood `Gaussian` Instance:\")\n",
        "#     print(\"Centre = \", instance.centre)\n",
        "#     print(\"Normalization = \", instance.normalization)\n",
        "#     print(\"Sigma = \", instance.sigma, \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Info__\n",
        "\n",
        "The info dictionary passed to the search, discussed earlier in this cookbook, is accessible."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "for info in agg.values(\"info\"):\n",
        "    print(info[\"date_of_observation\"])\n",
        "    print(info[\"exposure_time\"])"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The API for querying is fairly self explanatory. Through the combination of info based queries, model based\n",
        "queries and result based queries a user has all the tools they need to fit extremely large datasets with many different\n",
        "models and load only the results they are interested in for inspection and analysis.\n",
        "\n",
        "__Custom Output__\n",
        "\n",
        "The results accessible via the database (e.g. `model`, `samples`) are those contained in the `files` folder.\n",
        "\n",
        "By extending an `Analysis` class with the methods `save_attributes_for_aggregator` and `save_results_for_aggregator`, \n",
        "custom files can be written to the `files` folder and become accessible via the database."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "\n",
        "\n",
        "class Analysis(af.Analysis):\n",
        "    def __init__(self, data: np.ndarray, noise_map: np.ndarray):\n",
        "        \"\"\"\n",
        "        Standard Analysis class example used throughout PyAutoFit examples.\n",
        "        \"\"\"\n",
        "        super().__init__()\n",
        "\n",
        "        self.data = data\n",
        "        self.noise_map = noise_map\n",
        "\n",
        "    def log_likelihood_function(self, instance) -> float:\n",
        "        \"\"\"\n",
        "        Standard log likelihood function used throughout PyAutoFit examples.\n",
        "        \"\"\"\n",
        "\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "\n",
        "        residual_map = self.data - model_data\n",
        "        chi_squared_map = (residual_map / self.noise_map) ** 2.0\n",
        "        chi_squared = sum(chi_squared_map)\n",
        "        noise_normalization = np.sum(np.log(2 * np.pi * self.noise_map**2.0))\n",
        "        log_likelihood = -0.5 * (chi_squared + noise_normalization)\n",
        "\n",
        "        return log_likelihood\n",
        "\n",
        "    def save_attributes_for_aggregator(self, paths: af.DirectoryPaths):\n",
        "        \"\"\"\n",
        "        Before the non-linear search begins, this routine saves attributes of the `Analysis` object to the `files`\n",
        "        folder such that they can be loaded after the analysis using PyAutoFit's database and aggregator tools.\n",
        "\n",
        "        For this analysis, it uses the `AnalysisDataset` object's method to output the following:\n",
        "\n",
        "        - The dataset's data as a .json file.\n",
        "        - The dataset's noise-map as a .json file.\n",
        "\n",
        "        These are accessed using the aggregator via `agg.values(\"data\")` and `agg.values(\"noise_map\")`.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        paths\n",
        "            The PyAutoFit paths object which manages all paths, e.g. where the non-linear search outputs are stored,\n",
        "            visualization, and the pickled objects used by the aggregator output by this function.\n",
        "        \"\"\"\n",
        "        # The path where data.json is saved, e.g. output/dataset_name/unique_id/files/data.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"data.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(self.data, f, indent=4)\n",
        "\n",
        "        # The path where noise_map.json is saved, e.g. output/noise_mapset_name/unique_id/files/noise_map.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"noise_map.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(self.noise_map, f, indent=4)\n",
        "\n",
        "    def save_results_for_aggregator(self, paths: af.DirectoryPaths, result: af.Result):\n",
        "        \"\"\"\n",
        "        At the end of a model-fit,  this routine saves attributes of the `Analysis` object to the `files`\n",
        "        folder such that they can be loaded after the analysis using PyAutoFit's database and aggregator tools.\n",
        "\n",
        "        For this analysis it outputs the following:\n",
        "\n",
        "        - The maximum log likelihood model data as a .json file.\n",
        "\n",
        "        This is accessed using the aggregator via `agg.values(\"model_data\")`.\n",
        "\n",
        "        Parameters\n",
        "        ----------\n",
        "        paths\n",
        "            The PyAutoFit paths object which manages all paths, e.g. where the non-linear search outputs are stored,\n",
        "            visualization and the pickled objects used by the aggregator output by this function.\n",
        "        result\n",
        "            The result of a model fit, including the non-linear search, samples and maximum likelihood model.\n",
        "        \"\"\"\n",
        "        xvalues = np.arange(self.data.shape[0])\n",
        "\n",
        "        instance = result.max_log_likelihood_instance\n",
        "\n",
        "        model_data = instance.model_data_1d_via_xvalues_from(xvalues=xvalues)\n",
        "\n",
        "        # The path where model_data.json is saved, e.g. output/dataset_name/unique_id/files/model_data.json\n",
        "\n",
        "        file_path = (path.join(paths._json_path, \"model_data.json\"),)\n",
        "\n",
        "        with open(file_path, \"w+\") as f:\n",
        "            json.dump(model_data, f, indent=4)\n"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Querying Datasets__\n",
        "\n",
        "The aggregator can query the database, returning only specific fits of interested. \n",
        "\n",
        "We can query using the `dataset_name` string we input into the model-fit above, in order to get the results\n",
        "of a fit to a specific dataset. \n",
        "\n",
        "For example, querying using the string `gaussian_x1_1` returns results for only the fit using the \n",
        "second `Gaussian` dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "unique_tag = agg.search.unique_tag\n",
        "agg_query = agg.query(unique_tag == \"gaussian_x1_1\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As expected, this list has only 1 `SamplesDynesty` corresponding to the second dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "print(agg_query.values(\"samples\"))\n",
        "print(\"Total Samples Objects via dataset_name Query = \", len(agg_query), \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If we query using an incorrect dataset name we get no results."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "unique_tag = agg.search.unique_tag\n",
        "agg_query = agg.query(unique_tag == \"incorrect_name\")\n",
        "samples_gen = agg_query.values(\"samples\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Querying Searches__\n",
        "\n",
        "We can query using the `name` of the non-linear search used to fit the model. \n",
        "\n",
        "In this cookbook, all three fits used the same search, named `database_example`. Query based on search name in this \n",
        "example is therefore somewhat pointless. \n",
        "\n",
        "However, querying based on the search name is useful for model-fits which use a range of searches, for example\n",
        "if different non-linear searches are used multiple times.\n",
        "\n",
        "As expected, the query using search name below contains all 3 results."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "name = agg.search.name\n",
        "agg_query = agg.query(name == \"database_example\")\n",
        "\n",
        "print(agg_query.values(\"samples\"))\n",
        "print(\"Total Samples Objects via name Query = \", len(agg_query), \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Querying Models__\n",
        "\n",
        "We can query based on the model fitted. \n",
        "\n",
        "For example, we can load all results which fitted a `Gaussian` model-component, which in this simple example is all\n",
        "3 model-fits.\n",
        " \n",
        "Querying via the model is useful for loading results after performing many model-fits with many different model \n",
        "parameterizations to large (e.g. Bayesian model comparison).  \n",
        "\n",
        "[Note: the code `agg.model.gaussian` corresponds to the fact that in the `Collection` above, we named the model\n",
        "component `gaussian`. If this `Collection` had used a different name the code below would change \n",
        "correspondingly. Models with multiple model components (e.g., `gaussian` and `exponential`) are therefore also easily \n",
        "accessed via the database.]"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "gaussian = agg.model.gaussian\n",
        "agg_query = agg.query(gaussian == af.ex.Gaussian)\n",
        "print(\"Total Samples Objects via `Gaussian` model query = \", len(agg_query), \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Querying Results__\n",
        "\n",
        "We can query based on the results of the model-fit.\n",
        "\n",
        "Below, we query the database to find all fits where the inferred value of `sigma` for the `Gaussian` is less \n",
        "than 3.0 (which returns only the first of the three model-fits)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "gaussian = agg.model.gaussian\n",
        "agg_query = agg.query(gaussian.sigma < 3.0)\n",
        "print(\"Total Samples Objects In Query `gaussian.sigma < 3.0` = \", len(agg_query), \"\\n\")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__Querying with Logic__\n",
        "\n",
        "Advanced queries can be constructed using logic. \n",
        "\n",
        "Below, we combine the two queries above to find all results which fitted a `Gaussian` AND (using the & symbol) \n",
        "inferred a value of sigma less than 3.0. \n",
        "\n",
        "The OR logical clause is also supported via the symbol |."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "gaussian = agg.model.gaussian\n",
        "agg_query = agg.query((gaussian == af.ex.Gaussian) & (gaussian.sigma < 3.0))\n",
        "print(\n",
        "    \"Total Samples Objects In Query `Gaussian & sigma < 3.0` = \", len(agg_query), \"\\n\"\n",
        ")"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "__HowToFit__\n",
        "\n",
        "The Database chapter of the **HowToFit** Jupyter notebooks give a full description of the database feature, including \n",
        "examples of advanced queries and how to load and plot the results of a model-fit in more detail."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [],
      "outputs": [],
      "execution_count": null
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.1"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}

back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API