Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/hamzeiehsan/leisure-walking-analysis
23 June 2025, 05:42:13 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    • 22c33b5a1a32915a600817db808f716df265b70b
    No releases to show
  • cddb6d1
  • /
  • 2_POIs_From_OSM.ipynb
Raw File Download Save again
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge
swh:1:cnt:9947020a76bc16e758a95f34691535f948c8cfca
origin badgedirectory badge
swh:1:dir:cddb6d133e212246c2e458e4ea46f1358cd27927
origin badgerevision badge
swh:1:rev:22c33b5a1a32915a600817db808f716df265b70b
origin badgesnapshot badge
swh:1:snp:24de26e43def854424cacc232f9289821b17290a

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
Tip revision: 22c33b5a1a32915a600817db808f716df265b70b authored by hamzeiehsan on 27 April 2025, 10:08:03 UTC
add link to annotation tool
Tip revision: 22c33b5
2_POIs_From_OSM.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3553dd11-da3f-4ce6-9acc-b0180d0125bb",
   "metadata": {},
   "source": [
    "# POIs from OSM\n",
    "\n",
    "Aim is to download all candidate POIs from OSM. This set is used to provide suggestions for manaul annotations and later to build a candidate set of not-recommended POIs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63d48b04-006f-4436-a128-fe167e883a17",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Loading libraries and models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b8f28900-d441-4155-900a-6e0dcdc30d39",
   "metadata": {},
   "outputs": [],
   "source": [
    "# working with files\n",
    "import os.path\n",
    "# sys\n",
    "import sys\n",
    "\n",
    "# warning off\n",
    "import warnings\n",
    "# IO\n",
    "import json\n",
    "# calling Webservices\n",
    "import requests\n",
    "# systematic thread stops for polite crawling\n",
    "import time\n",
    "\n",
    "# set random seed for reproducibility of results\n",
    "from umap import UMAP\n",
    "\n",
    "# dataframe \n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import geopandas as gpd\n",
    "\n",
    "# geocoding\n",
    "from geopy.geocoders import Nominatim\n",
    "\n",
    "# getting data from OSM\n",
    "import osmnx as ox\n",
    "\n",
    "# topic modelling\n",
    "from bertopic import BERTopic\n",
    "from bertopic.vectorizers import ClassTfidfTransformer\n",
    "from bertopic.representation import KeyBERTInspired\n",
    "\n",
    "# nlp\n",
    "from sentence_transformers import SentenceTransformer, util\n",
    "import spacy\n",
    "from nltk.corpus import stopwords\n",
    "\n",
    "# visualization\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# logging\n",
    "from loguru import logger\n",
    "\n",
    "# set logger level\n",
    "logger.remove(0)\n",
    "logger.add(sys.stderr, level=\"INFO\")\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "# en_core_web_lg must be downloaded, if not run: 'python -m spacy download en_core_web_lg' first!\n",
    "nlp = spacy.load('en_core_web_lg')\n",
    "\n",
    "umap_model = UMAP(random_state=42)\n",
    "\n",
    "stopword_removal = False"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2de8af2-fc1e-4470-9496-92e1e2c9d1e1",
   "metadata": {},
   "source": [
    "## Dataset\n",
    "\n",
    "Reading the dataset crawled from WalkingMap website."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6595f7ec-753e-47b4-a58d-1b48db1e2af6",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('dataset/walkingmaps.json', 'r', encoding='utf-8') as fp:\n",
    "    dataset = json.load(fp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d81d69ea-afd1-46d1-8ba5-927278c6f093",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:25:12.429\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m2\u001b[0m - \u001b[1mstructure of records in dataset: dict_keys(['markers', 'pathDetails', 'pois', 'title', 'description'])\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "logger.debug(f'an example record in dataset: {dataset[1]}')\n",
    "logger.info(f'structure of records in dataset: {dataset[1].keys()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aba3a1d9-08a2-4dc1-ac86-03c65144df0c",
   "metadata": {},
   "source": [
    "### Dataset Transformation\n",
    "\n",
    "Aim: Transforming the dataset into pandas and geopandas dataframes, with a focus on POIs\n",
    "\n",
    "A basic preprocessing step to create a dataset of POI descriptions, also including A preliminary analysis their location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "efd8dc1c-73ff-44b2-9262-36c3afa0bdc0",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:25:14.304\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m7\u001b[0m - \u001b[1mrecords: 386 total POIs: 4392 - average per record: 11\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "counter = 0\n",
    "total = 0\n",
    "for record in dataset:\n",
    "    if 'pois' in record.keys() and len(record['pois']) > 0:\n",
    "        counter+=1\n",
    "        total += len(record['pois'])\n",
    "logger.info('records: {0} total POIs: {1} - average per record: {2}'.format(counter, total, round(total/counter)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f6e30398-9994-466c-bdf7-06c9c08b627c",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:25:18.824\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m14\u001b[0m - \u001b[1mword count verbal description \n",
      "\t- average: 181 - median: 130.0 - min: 7 - max: 540\u001b[0m\n",
      "\u001b[32m2025-01-25 20:25:18.836\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m16\u001b[0m - \u001b[1mword count POI verbal description \n",
      "\t- average: 22 - median: 23.0 - min: 2 - max: 116\u001b[0m\n",
      "\u001b[32m2025-01-25 20:25:18.837\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m18\u001b[0m - \u001b[1mcharacter count verbal description \n",
      "\t- average: 1062 - median: 764.0 - min: 43 - max: 3052\u001b[0m\n",
      "\u001b[32m2025-01-25 20:25:18.840\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m20\u001b[0m - \u001b[1mcharacter count POI verbal description \n",
      "\t- average: 130 - median: 127.0 - min: 11 - max: 296\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "import statistics \n",
    "\n",
    "verbal_descriptions = []\n",
    "poi_descriptions = []\n",
    "for record in dataset:\n",
    "    if 'pois' in record.keys() and len(record['pois']) > 0:\n",
    "        vb = record['title']+' '+record['description']\n",
    "        verbal_descriptions.append(vb)\n",
    "        for poi in record['pois']:\n",
    "            pvb = poi['title']+' '+poi['summary']\n",
    "            poi_descriptions.append(pvb)\n",
    "\n",
    "wc_vb = [len(vb.split(' ')) for vb in verbal_descriptions]\n",
    "logger.info(f'word count verbal description \\n\\t- average: {int(statistics.mean(wc_vb))} - median: {statistics.median(wc_vb)} - min: {min(wc_vb)} - max: {max(wc_vb)}')\n",
    "wc_pvb = [len(pvb.split(' ')) for pvb in poi_descriptions]\n",
    "logger.info(f'word count POI verbal description \\n\\t- average: {int(statistics.mean(wc_pvb))} - median: {statistics.median(wc_pvb)} - min: {min(wc_pvb)} - max: {max(wc_pvb)}')\n",
    "cc_vb = [len(vb) for vb in verbal_descriptions]\n",
    "logger.info(f'character count verbal description \\n\\t- average: {int(statistics.mean(cc_vb))} - median: {statistics.median(cc_vb)} - min: {min(cc_vb)} - max: {max(cc_vb)}')\n",
    "cc_pvb = [len(pvb) for pvb in poi_descriptions]\n",
    "logger.info(f'character count POI verbal description \\n\\t- average: {int(statistics.mean(cc_pvb))} - median: {statistics.median(cc_pvb)} - min: {min(cc_pvb)} - max: {max(cc_pvb)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "92b40be0-05f3-41d6-9ea5-42eb9323a8d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_structure = {'record_title':[], 'record_description':[], 'poi_title':[], 'poi_summary':[], 'latitude': [], 'longitude': []}\n",
    "for record in dataset:\n",
    "    if 'pois' in record.keys() and len(record['pois']) > 0:\n",
    "        for poi in record['pois']:\n",
    "            data_structure['record_title'].append(record['title'])\n",
    "            data_structure['record_description'].append(record['description'])\n",
    "            data_structure['poi_title'].append(poi['title'])\n",
    "            data_structure['poi_summary'].append(poi['summary'])\n",
    "            data_structure['latitude'].append(poi['lat'])\n",
    "            data_structure['longitude'].append(poi['lng'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "038c85cf-13b3-497f-9e0c-0302694f02d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data_structure)\n",
    "\n",
    "gdf = gpd.GeoDataFrame(df[['poi_title', 'poi_summary', 'latitude', 'longitude']], geometry=gpd.points_from_xy(df.longitude, df.latitude), crs=\"EPSG:4326\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9e55d20b-785b-4602-b7cd-f3585b1fa4e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>poi_title</th>\n",
       "      <th>poi_summary</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Fairhaven Surf Life Saving Club</td>\n",
       "      <td>Fairhaven is a well known surf beach. The beac...</td>\n",
       "      <td>-38.468759</td>\n",
       "      <td>144.084459</td>\n",
       "      <td>POINT (144.08446 -38.46876)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beach walk</td>\n",
       "      <td>From Sprout Creek, Eastern View, Moggs Creek, ...</td>\n",
       "      <td>-38.468542</td>\n",
       "      <td>144.089693</td>\n",
       "      <td>POINT (144.08969 -38.46854)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Rock pools</td>\n",
       "      <td>See what sort of shells and stones you can col...</td>\n",
       "      <td>-38.468459</td>\n",
       "      <td>144.092420</td>\n",
       "      <td>POINT (144.09242 -38.46846)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sand dunes</td>\n",
       "      <td>The beautiful rolling sand dunes shape the bea...</td>\n",
       "      <td>-38.468418</td>\n",
       "      <td>144.095318</td>\n",
       "      <td>POINT (144.09532 -38.46842)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Painkalac Creek</td>\n",
       "      <td>The creek separates Aireys Inlet from Fairhave...</td>\n",
       "      <td>-38.468390</td>\n",
       "      <td>144.097312</td>\n",
       "      <td>POINT (144.09731 -38.46839)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         poi_title  \\\n",
       "0  Fairhaven Surf Life Saving Club   \n",
       "1                       Beach walk   \n",
       "2                       Rock pools   \n",
       "3                       Sand dunes   \n",
       "4                  Painkalac Creek   \n",
       "\n",
       "                                         poi_summary   latitude   longitude  \\\n",
       "0  Fairhaven is a well known surf beach. The beac... -38.468759  144.084459   \n",
       "1  From Sprout Creek, Eastern View, Moggs Creek, ... -38.468542  144.089693   \n",
       "2  See what sort of shells and stones you can col... -38.468459  144.092420   \n",
       "3  The beautiful rolling sand dunes shape the bea... -38.468418  144.095318   \n",
       "4  The creek separates Aireys Inlet from Fairhave... -38.468390  144.097312   \n",
       "\n",
       "                      geometry  \n",
       "0  POINT (144.08446 -38.46876)  \n",
       "1  POINT (144.08969 -38.46854)  \n",
       "2  POINT (144.09242 -38.46846)  \n",
       "3  POINT (144.09532 -38.46842)  \n",
       "4  POINT (144.09731 -38.46839)  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gdf.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "af6e7efd-916e-43ab-8340-4f8b0548f10a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>record_title</th>\n",
       "      <th>record_description</th>\n",
       "      <th>poi_title</th>\n",
       "      <th>poi_summary</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Fairhaven to Aireys Inlet Walk created by tedm...</td>\n",
       "      <td>Apart from the points of interested listed, he...</td>\n",
       "      <td>Fairhaven Surf Life Saving Club</td>\n",
       "      <td>Fairhaven is a well known surf beach. The beac...</td>\n",
       "      <td>-38.468759</td>\n",
       "      <td>144.084459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Fairhaven to Aireys Inlet Walk created by tedm...</td>\n",
       "      <td>Apart from the points of interested listed, he...</td>\n",
       "      <td>Beach walk</td>\n",
       "      <td>From Sprout Creek, Eastern View, Moggs Creek, ...</td>\n",
       "      <td>-38.468542</td>\n",
       "      <td>144.089693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Fairhaven to Aireys Inlet Walk created by tedm...</td>\n",
       "      <td>Apart from the points of interested listed, he...</td>\n",
       "      <td>Rock pools</td>\n",
       "      <td>See what sort of shells and stones you can col...</td>\n",
       "      <td>-38.468459</td>\n",
       "      <td>144.092420</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Fairhaven to Aireys Inlet Walk created by tedm...</td>\n",
       "      <td>Apart from the points of interested listed, he...</td>\n",
       "      <td>Sand dunes</td>\n",
       "      <td>The beautiful rolling sand dunes shape the bea...</td>\n",
       "      <td>-38.468418</td>\n",
       "      <td>144.095318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Fairhaven to Aireys Inlet Walk created by tedm...</td>\n",
       "      <td>Apart from the points of interested listed, he...</td>\n",
       "      <td>Painkalac Creek</td>\n",
       "      <td>The creek separates Aireys Inlet from Fairhave...</td>\n",
       "      <td>-38.468390</td>\n",
       "      <td>144.097312</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        record_title  \\\n",
       "0  Fairhaven to Aireys Inlet Walk created by tedm...   \n",
       "1  Fairhaven to Aireys Inlet Walk created by tedm...   \n",
       "2  Fairhaven to Aireys Inlet Walk created by tedm...   \n",
       "3  Fairhaven to Aireys Inlet Walk created by tedm...   \n",
       "4  Fairhaven to Aireys Inlet Walk created by tedm...   \n",
       "\n",
       "                                  record_description  \\\n",
       "0  Apart from the points of interested listed, he...   \n",
       "1  Apart from the points of interested listed, he...   \n",
       "2  Apart from the points of interested listed, he...   \n",
       "3  Apart from the points of interested listed, he...   \n",
       "4  Apart from the points of interested listed, he...   \n",
       "\n",
       "                         poi_title  \\\n",
       "0  Fairhaven Surf Life Saving Club   \n",
       "1                       Beach walk   \n",
       "2                       Rock pools   \n",
       "3                       Sand dunes   \n",
       "4                  Painkalac Creek   \n",
       "\n",
       "                                         poi_summary   latitude   longitude  \n",
       "0  Fairhaven is a well known surf beach. The beac... -38.468759  144.084459  \n",
       "1  From Sprout Creek, Eastern View, Moggs Creek, ... -38.468542  144.089693  \n",
       "2  See what sort of shells and stones you can col... -38.468459  144.092420  \n",
       "3  The beautiful rolling sand dunes shape the bea... -38.468418  144.095318  \n",
       "4  The creek separates Aireys Inlet from Fairhave... -38.468390  144.097312  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ef45828-36e9-4ec4-a517-bd2700d67044",
   "metadata": {},
   "source": [
    "## OSM Points of Interests\n",
    "\n",
    "**Aim**: Collect rich OSM POI information in the bounding box area of the lesiure walk.\n",
    "\n",
    "**Approach**: Using OSM tags for `{'amenity': True, 'natural': True, 'animal': True, 'leisure': True}` to collect information inside the bounding boxes of leisure walks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0a78e99b-da4d-413e-a7f9-867de45142ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# bounding box of each path\n",
    "paths = []\n",
    "for record in dataset:\n",
    "    min_lat = 90\n",
    "    max_lat = -90\n",
    "    min_lng = 180\n",
    "    max_lng = -180\n",
    "    for latlng in record['pathDetails']:\n",
    "        if min_lat > latlng['lat']:\n",
    "            min_lat = latlng['lat']\n",
    "        if max_lat < latlng['lat']:\n",
    "            max_lat = latlng['lat']\n",
    "        if min_lng > latlng['lng']:\n",
    "            min_lng = latlng['lng']\n",
    "        if max_lng < latlng['lng']:\n",
    "            max_lng = latlng['lng']\n",
    "    paths.append({'min_lat': min_lat, 'max_lat': max_lat, 'min_lng': min_lng, 'max_lng': max_lng})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f8f21a75-2721-4185-b858-75e8c31469ce",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:25:55.110\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m2\u001b[0m - \u001b[1mexample path bounding boxes: {'min_lat': -37.82326007, 'max_lat': -37.81401352, 'min_lng': 144.96751249, 'max_lng': 144.97828424}\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "logger.debug(f'all path bounding boxes: {paths}')  # bounding box information leisure walks\n",
    "logger.info(f'example path bounding boxes: {paths[0]}')  # bounding box information leisure walks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "0591eb39-11c1-4355-88af-700b26a7b05c",
   "metadata": {},
   "outputs": [],
   "source": [
    "tags = {'amenity': True, 'natural': True, 'animal': True, 'leisure': True}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3b8d1e21-32ae-421f-a307-4ac15bc3926b",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = paths[0]\n",
    "feature_gdf = ox.features_from_bbox(north=path['max_lat'], south=path['min_lat'], east=path['max_lng'], west=path['min_lng'], tags=tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a7401570-ef88-49df-85d6-ae91ca0bd633",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>addr:city</th>\n",
       "      <th>addr:housenumber</th>\n",
       "      <th>addr:postcode</th>\n",
       "      <th>addr:street</th>\n",
       "      <th>amenity</th>\n",
       "      <th>name</th>\n",
       "      <th>operator</th>\n",
       "      <th>website</th>\n",
       "      <th>wikidata</th>\n",
       "      <th>geometry</th>\n",
       "      <th>...</th>\n",
       "      <th>motor_vehicle</th>\n",
       "      <th>contact:instagram</th>\n",
       "      <th>building:part</th>\n",
       "      <th>not:operator:wikidata</th>\n",
       "      <th>water</th>\n",
       "      <th>unisex</th>\n",
       "      <th>ways</th>\n",
       "      <th>type</th>\n",
       "      <th>intermittent</th>\n",
       "      <th>salt</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">node</th>\n",
       "      <th>176729780</th>\n",
       "      <td>Melbourne</td>\n",
       "      <td>191</td>\n",
       "      <td>3000</td>\n",
       "      <td>Collins Street</td>\n",
       "      <td>theatre</td>\n",
       "      <td>Regent Theatre</td>\n",
       "      <td>Marriner Group</td>\n",
       "      <td>https://www.marrinergroup.com.au/theatre-regen...</td>\n",
       "      <td>Q7308110</td>\n",
       "      <td>POINT (144.96760 -37.81550)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247024808</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97019 -37.81548)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247689970</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97070 -37.81789)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>266733834</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>parking</td>\n",
       "      <td>Sofitel Hotel Carpark</td>\n",
       "      <td>Wilson Parking</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97302 -37.81451)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>304169365</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>theatre</td>\n",
       "      <td>Playhouse Theatre</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.96840 -37.82172)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 139 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                        addr:city addr:housenumber addr:postcode  \\\n",
       "element_type osmid                                                 \n",
       "node         176729780  Melbourne              191          3000   \n",
       "             247024808        NaN              NaN           NaN   \n",
       "             247689970        NaN              NaN           NaN   \n",
       "             266733834        NaN              NaN           NaN   \n",
       "             304169365        NaN              NaN           NaN   \n",
       "\n",
       "                           addr:street           amenity  \\\n",
       "element_type osmid                                         \n",
       "node         176729780  Collins Street           theatre   \n",
       "             247024808             NaN  parking_entrance   \n",
       "             247689970             NaN  parking_entrance   \n",
       "             266733834             NaN           parking   \n",
       "             304169365             NaN           theatre   \n",
       "\n",
       "                                         name        operator  \\\n",
       "element_type osmid                                              \n",
       "node         176729780         Regent Theatre  Marriner Group   \n",
       "             247024808                    NaN             NaN   \n",
       "             247689970                    NaN             NaN   \n",
       "             266733834  Sofitel Hotel Carpark  Wilson Parking   \n",
       "             304169365      Playhouse Theatre             NaN   \n",
       "\n",
       "                                                                  website  \\\n",
       "element_type osmid                                                          \n",
       "node         176729780  https://www.marrinergroup.com.au/theatre-regen...   \n",
       "             247024808                                                NaN   \n",
       "             247689970                                                NaN   \n",
       "             266733834                                                NaN   \n",
       "             304169365                                                NaN   \n",
       "\n",
       "                        wikidata                     geometry  ...  \\\n",
       "element_type osmid                                             ...   \n",
       "node         176729780  Q7308110  POINT (144.96760 -37.81550)  ...   \n",
       "             247024808       NaN  POINT (144.97019 -37.81548)  ...   \n",
       "             247689970       NaN  POINT (144.97070 -37.81789)  ...   \n",
       "             266733834       NaN  POINT (144.97302 -37.81451)  ...   \n",
       "             304169365       NaN  POINT (144.96840 -37.82172)  ...   \n",
       "\n",
       "                       motor_vehicle contact:instagram building:part  \\\n",
       "element_type osmid                                                     \n",
       "node         176729780           NaN               NaN           NaN   \n",
       "             247024808           NaN               NaN           NaN   \n",
       "             247689970           NaN               NaN           NaN   \n",
       "             266733834           NaN               NaN           NaN   \n",
       "             304169365           NaN               NaN           NaN   \n",
       "\n",
       "                       not:operator:wikidata water unisex ways type  \\\n",
       "element_type osmid                                                    \n",
       "node         176729780                   NaN   NaN    NaN  NaN  NaN   \n",
       "             247024808                   NaN   NaN    NaN  NaN  NaN   \n",
       "             247689970                   NaN   NaN    NaN  NaN  NaN   \n",
       "             266733834                   NaN   NaN    NaN  NaN  NaN   \n",
       "             304169365                   NaN   NaN    NaN  NaN  NaN   \n",
       "\n",
       "                       intermittent salt  \n",
       "element_type osmid                        \n",
       "node         176729780          NaN  NaN  \n",
       "             247024808          NaN  NaN  \n",
       "             247689970          NaN  NaN  \n",
       "             266733834          NaN  NaN  \n",
       "             304169365          NaN  NaN  \n",
       "\n",
       "[5 rows x 139 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_gdf.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "61655466-9d8f-421d-9eb6-b7d02e293140",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:01.589\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m2\u001b[0m - \u001b[1mnumber of feature columns in feature gdf: 139\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "logger.debug(f'feature columns: {feature_gdf.columns}')\n",
    "logger.info(f'number of feature columns in feature gdf: {len(feature_gdf.columns)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "53afd211-74a1-4a33-8fa1-b035cc2ce3f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_gdf.amenity = feature_gdf.amenity.astype(str)\n",
    "feature_gdf.natural = feature_gdf.natural.astype(str)\n",
    "feature_gdf.leisure = feature_gdf.leisure.astype(str)\n",
    "feature_gdf.name = feature_gdf.name.astype(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f5673bad-11e1-4966-a32e-cd71ff7a8cc6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">node</th>\n",
       "      <th>176729780</th>\n",
       "      <td>Regent Theatre</td>\n",
       "      <td>theatre</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.96760 -37.81550)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247024808</th>\n",
       "      <td>nan</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97019 -37.81548)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247689970</th>\n",
       "      <td>nan</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97070 -37.81789)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>266733834</th>\n",
       "      <td>Sofitel Hotel Carpark</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97302 -37.81451)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>304169365</th>\n",
       "      <td>Playhouse Theatre</td>\n",
       "      <td>theatre</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.96840 -37.82172)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         name           amenity natural  \\\n",
       "element_type osmid                                                        \n",
       "node         176729780         Regent Theatre           theatre     nan   \n",
       "             247024808                    nan  parking_entrance     nan   \n",
       "             247689970                    nan  parking_entrance     nan   \n",
       "             266733834  Sofitel Hotel Carpark           parking     nan   \n",
       "             304169365      Playhouse Theatre           theatre     nan   \n",
       "\n",
       "                       leisure                     geometry  \n",
       "element_type osmid                                           \n",
       "node         176729780     nan  POINT (144.96760 -37.81550)  \n",
       "             247024808     nan  POINT (144.97019 -37.81548)  \n",
       "             247689970     nan  POINT (144.97070 -37.81789)  \n",
       "             266733834     nan  POINT (144.97302 -37.81451)  \n",
       "             304169365     nan  POINT (144.96840 -37.82172)  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_gdf = feature_gdf[['name', 'amenity', 'natural', 'leisure', 'geometry']].dropna(how='all')\n",
    "feature_gdf.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "44bd6a84-52b1-4dea-a9e2-b0e14a81009d",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:05.802\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m27\u001b[0m - \u001b[33m\u001b[1merror in writing path 59 out of 387...\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:05.802\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m28\u001b[0m - \u001b[33m\u001b[1mNo data elements in server response. Check log and query location/tags.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:05.918\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m27\u001b[0m - \u001b[33m\u001b[1merror in writing path 128 out of 387...\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:05.919\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m28\u001b[0m - \u001b[33m\u001b[1mNo data elements in server response. Check log and query location/tags.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.032\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m27\u001b[0m - \u001b[33m\u001b[1merror in writing path 151 out of 387...\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.033\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m28\u001b[0m - \u001b[33m\u001b[1mNo data elements in server response. Check log and query location/tags.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.150\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m27\u001b[0m - \u001b[33m\u001b[1merror in writing path 309 out of 387...\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.151\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m28\u001b[0m - \u001b[33m\u001b[1mNo data elements in server response. Check log and query location/tags.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.266\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m27\u001b[0m - \u001b[33m\u001b[1merror in writing path 338 out of 387...\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:06.266\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m28\u001b[0m - \u001b[33m\u001b[1mNo data elements in server response. Check log and query location/tags.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "for idx, path in enumerate(paths):\n",
    "    \n",
    "    if os.path.isfile('dataset/features-osm-{}.geojson'.format(idx)):\n",
    "        logger.debug('features for path {0} out of {1} is already loaded and saved.'.format(idx, len(paths)))\n",
    "        continue;\n",
    "    try:\n",
    "        feature_gdf = ox.features_from_bbox(north=path['max_lat'], south=path['min_lat'], east=path['max_lng'], west=path['min_lng'], tags=tags)\n",
    "        cols = feature_gdf.columns\n",
    "        #preprocess\n",
    "        if 'amenity' not in cols:\n",
    "            feature_gdf['amenity'] = np.nan\n",
    "        feature_gdf.amenity = feature_gdf.amenity.astype(str)    \n",
    "        if 'natural' not in cols:\n",
    "            feature_gdf['natural'] = np.nan\n",
    "        feature_gdf.natural = feature_gdf.natural.astype(str)    \n",
    "        if 'leisure' not in cols:\n",
    "            feature_gdf['leisure'] = np.nan\n",
    "        feature_gdf.leisure = feature_gdf.leisure.astype(str)\n",
    "        if 'name' not in cols:\n",
    "            feature_gdf['name'] = np.nan\n",
    "        feature_gdf.name = feature_gdf.name.astype(str)\n",
    "        feature_gdf = feature_gdf[['name', 'amenity', 'natural', 'leisure', 'geometry']].dropna(how='all')\n",
    "    \n",
    "        feature_gdf.to_file(\"dataset/features-osm-{}.geojson\".format(idx), driver='GeoJSON')\n",
    "        logger.info('features for path {0} out of {1} is loaded from OSM and saved ...'.format(idx, len(paths)))\n",
    "    except Exception as e:\n",
    "        logger.warning('error in writing path {0} out of {1}...'.format(idx, len(paths)))\n",
    "        logger.warning(e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "edcba713-e0cd-4547-a196-c8ea15e421b0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">node</th>\n",
       "      <th>176729780</th>\n",
       "      <td>Regent Theatre</td>\n",
       "      <td>theatre</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.96760 -37.81550)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247024808</th>\n",
       "      <td>nan</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97019 -37.81548)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247689970</th>\n",
       "      <td>nan</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97070 -37.81789)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>266733834</th>\n",
       "      <td>Sofitel Hotel Carpark</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.97302 -37.81451)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>304169365</th>\n",
       "      <td>Playhouse Theatre</td>\n",
       "      <td>theatre</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.96840 -37.82172)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         name           amenity natural  \\\n",
       "element_type osmid                                                        \n",
       "node         176729780         Regent Theatre           theatre     nan   \n",
       "             247024808                    nan  parking_entrance     nan   \n",
       "             247689970                    nan  parking_entrance     nan   \n",
       "             266733834  Sofitel Hotel Carpark           parking     nan   \n",
       "             304169365      Playhouse Theatre           theatre     nan   \n",
       "\n",
       "                       leisure                     geometry  \n",
       "element_type osmid                                           \n",
       "node         176729780     nan  POINT (144.96760 -37.81550)  \n",
       "             247024808     nan  POINT (144.97019 -37.81548)  \n",
       "             247689970     nan  POINT (144.97070 -37.81789)  \n",
       "             266733834     nan  POINT (144.97302 -37.81451)  \n",
       "             304169365     nan  POINT (144.96840 -37.82172)  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_gdf.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a308bc1b-952f-45cd-a096-60797f04dd8a",
   "metadata": {},
   "source": [
    "## Matching POIs to OSM POIs\n",
    "\n",
    "**Aim** To match collected OSM POIs with described POIs\n",
    "\n",
    "**Approach** using textual matching of POIs description to OSM tags (*semantic criterion*) and spatial matching based on proximity (*spatial criterion*):\n",
    "\n",
    "- spatial criterion: defining containment\n",
    "- semantic criterion: defining semantic similarity using word embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13954847-672f-412f-a226-0aba7df4e008",
   "metadata": {},
   "source": [
    "### Semantic Matching\n",
    "\n",
    "Ranking the relevance of textual descriptions in OSM POIs and LW POIS\n",
    "\n",
    "Example to test how it works"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "ee1739a5-ed0a-4360-9c4f-795ab15c67b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "sbert_model = SentenceTransformer('bert-base-nli-mean-tokens') # symmetric semantic search\n",
    "msmarco_model = SentenceTransformer('sentence-transformers/msmarco-distilbert-dot-v5')  # asymmetric semantic search\n",
    "\n",
    "# embedding derived from BERT for the pois\n",
    "def embed_texts(sentences, model=sbert_model):\n",
    "    sentence_embeddings = model.encode(sentences)\n",
    "    return sentence_embeddings\n",
    "\n",
    "\n",
    "def compute_similarities(query, sentences, sentence_embeddings, model=sbert_model):\n",
    "    query_vec = embed_texts(query)\n",
    "    scores = util.dot_score(query_vec, sentence_embeddings)[0].cpu().tolist()\n",
    "    doc_score_pairs = list(zip(sentences, scores))\n",
    "    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)\n",
    "    logger.debug(\"Query:\", query)\n",
    "    for doc, score in doc_score_pairs:\n",
    "        logger.info(f'\\t{score}\\t{doc}')\n",
    "    return doc_score_pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "a70d447b-bfc3-47bb-b801-1adea793bbae",
   "metadata": {},
   "outputs": [],
   "source": [
    "def only_noun_phrases(sentence):\n",
    "    doc = nlp(sentence)\n",
    "    phrases = set() \n",
    "    for nc in doc.noun_chunks:\n",
    "        phrases.add(nc.text)\n",
    "        phrases.add(doc[nc.root.left_edge.i:nc.root.right_edge.i+1].text)\n",
    "    return ' '.join(phrases)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "a64d4a70-0e28-401d-a85c-83c552f568ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:15.807\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t112.580322265625\tCompleted in 1870, the Melbourne Town Hall is at the heart of the city's cultural and civic activity\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:15.808\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t92.63180541992188\tGasworks Park: There are artists studios, a theatre and a cafe. Every 3rd Saturday there is a Farmers' Market.  In the park you'll also come across various wonderful sculptures and installations.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:15.809\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t84.313232421875\tThe magnificent octagonal domed reading room is both a quiet space for study and an iconic Melbourne location to take an unforgettable selfie.?\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:15.809\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t62.57238006591797\tAustralia's Number One university and world leader in education, teaching and research excellence.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[(\"Completed in 1870, the Melbourne Town Hall is at the heart of the city's cultural and civic activity\",\n",
       "  112.580322265625),\n",
       " (\"Gasworks Park: There are artists studios, a theatre and a cafe. Every 3rd Saturday there is a Farmers' Market.  In the park you'll also come across various wonderful sculptures and installations.\",\n",
       "  92.63180541992188),\n",
       " ('The magnificent octagonal domed reading room is both a quiet space for study and an iconic Melbourne location to take an unforgettable selfie.?',\n",
       "  84.313232421875),\n",
       " (\"Australia's Number One university and world leader in education, teaching and research excellence.\",\n",
       "  62.57238006591797)]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example_poi_osm = \"Gaswork park theatre\"\n",
    "example_sentences = [\"Gasworks Park: There are artists studios, a theatre and a cafe. Every 3rd Saturday there is a Farmers' Market.  In the park you'll also come across various wonderful sculptures and installations.\",\n",
    "                                                 \"Australia's Number One university and world leader in education, teaching and research excellence.\",\n",
    "                                                 \"Completed in 1870, the Melbourne Town Hall is at the heart of the city's cultural and civic activity\",\n",
    "                                                 \"The magnificent octagonal domed reading room is both a quiet space for study and an iconic Melbourne location to take an unforgettable selfie.?\"]\n",
    "example_sentence_embeddings = embed_texts(example_sentences)\n",
    "compute_similarities(example_poi_osm, example_sentences, example_sentence_embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c01cf123-46b7-4e4a-9682-787ffccb508b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:16.815\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t125.21781921386719\ta theatre and a cafe a cafe various wonderful sculptures and installations a Farmers' Market the park artists studios artists studios, a theatre and a cafe various wonderful sculptures you a theatre Gasworks Park: Gasworks Park installations\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:16.816\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t121.1483154296875\tthe heart of the city's cultural and civic activity the city's cultural and civic activity the heart the Melbourne Town Hall\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:16.816\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t72.54638671875\tan iconic Melbourne location The magnificent octagonal domed reading room a quiet space for study and an iconic Melbourne location to take an unforgettable selfie a quiet space study an unforgettable selfie an iconic Melbourne location to take an unforgettable selfie\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:16.816\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t60.15925598144531\teducation Australia's Number One university and world leader research Australia's Number One university and world leader in education, teaching and research excellence. teaching teaching and research excellence excellence education, teaching and research excellence\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[(\"a theatre and a cafe a cafe various wonderful sculptures and installations a Farmers' Market the park artists studios artists studios, a theatre and a cafe various wonderful sculptures you a theatre Gasworks Park: Gasworks Park installations\",\n",
       "  125.21781921386719),\n",
       " (\"the heart of the city's cultural and civic activity the city's cultural and civic activity the heart the Melbourne Town Hall\",\n",
       "  121.1483154296875),\n",
       " ('an iconic Melbourne location The magnificent octagonal domed reading room a quiet space for study and an iconic Melbourne location to take an unforgettable selfie a quiet space study an unforgettable selfie an iconic Melbourne location to take an unforgettable selfie',\n",
       "  72.54638671875),\n",
       " (\"education Australia's Number One university and world leader research Australia's Number One university and world leader in education, teaching and research excellence. teaching teaching and research excellence excellence education, teaching and research excellence\",\n",
       "  60.15925598144531)]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# with preprocessing\n",
    "only_noun_example_sentences = [only_noun_phrases(sentence) for sentence in example_sentences]\n",
    "example_sentence_embeddings = embed_texts(only_noun_example_sentences)\n",
    "compute_similarities(example_poi_osm, only_noun_example_sentences, example_sentence_embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "1ff27901-3652-41d2-b572-5827f01e510b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:17.606\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t42.83140563964844\tGasworks Park: There are artists studios, a theatre and a cafe. Every 3rd Saturday there is a Farmers' Market.  In the park you'll also come across various wonderful sculptures and installations.\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:17.606\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t30.717838287353516\tCompleted in 1870, the Melbourne Town Hall is at the heart of the city's cultural and civic activity\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:17.607\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t25.081661224365234\tThe magnificent octagonal domed reading room is both a quiet space for study and an iconic Melbourne location to take an unforgettable selfie.?\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:17.607\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t18.959270477294922\tAustralia's Number One university and world leader in education, teaching and research excellence.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[(\"Gasworks Park: There are artists studios, a theatre and a cafe. Every 3rd Saturday there is a Farmers' Market.  In the park you'll also come across various wonderful sculptures and installations.\",\n",
       "  42.83140563964844),\n",
       " (\"Completed in 1870, the Melbourne Town Hall is at the heart of the city's cultural and civic activity\",\n",
       "  30.717838287353516),\n",
       " ('The magnificent octagonal domed reading room is both a quiet space for study and an iconic Melbourne location to take an unforgettable selfie.?',\n",
       "  25.081661224365234),\n",
       " (\"Australia's Number One university and world leader in education, teaching and research excellence.\",\n",
       "  18.959270477294922)]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example_sentence_embeddings = embed_texts(example_sentences, model=msmarco_model)\n",
    "compute_similarities(example_poi_osm, example_sentences, example_sentence_embeddings, model=msmarco_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "11744fd2",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:26:19.009\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t43.246639251708984\ta theatre and a cafe a cafe various wonderful sculptures and installations a Farmers' Market the park artists studios artists studios, a theatre and a cafe various wonderful sculptures you a theatre Gasworks Park: Gasworks Park installations\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:19.010\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t30.413190841674805\tthe heart of the city's cultural and civic activity the city's cultural and civic activity the heart the Melbourne Town Hall\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:19.010\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t24.648412704467773\tan iconic Melbourne location The magnificent octagonal domed reading room a quiet space for study and an iconic Melbourne location to take an unforgettable selfie a quiet space study an unforgettable selfie an iconic Melbourne location to take an unforgettable selfie\u001b[0m\n",
      "\u001b[32m2025-01-25 20:26:19.011\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities\u001b[0m:\u001b[36m17\u001b[0m - \u001b[1m\t18.057191848754883\teducation Australia's Number One university and world leader research Australia's Number One university and world leader in education, teaching and research excellence. teaching teaching and research excellence excellence education, teaching and research excellence\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[(\"a theatre and a cafe a cafe various wonderful sculptures and installations a Farmers' Market the park artists studios artists studios, a theatre and a cafe various wonderful sculptures you a theatre Gasworks Park: Gasworks Park installations\",\n",
       "  43.246639251708984),\n",
       " (\"the heart of the city's cultural and civic activity the city's cultural and civic activity the heart the Melbourne Town Hall\",\n",
       "  30.413190841674805),\n",
       " ('an iconic Melbourne location The magnificent octagonal domed reading room a quiet space for study and an iconic Melbourne location to take an unforgettable selfie a quiet space study an unforgettable selfie an iconic Melbourne location to take an unforgettable selfie',\n",
       "  24.648412704467773),\n",
       " (\"education Australia's Number One university and world leader research Australia's Number One university and world leader in education, teaching and research excellence. teaching teaching and research excellence excellence education, teaching and research excellence\",\n",
       "  18.057191848754883)]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example_sentence_embeddings = embed_texts(only_noun_example_sentences, model=msmarco_model)\n",
    "compute_similarities(example_poi_osm, only_noun_example_sentences, example_sentence_embeddings, model=msmarco_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21044e1b",
   "metadata": {},
   "source": [
    "#### Conclusions\n",
    "\n",
    "The process of matching cannot be fully autoamted - the tasks is more complex than using BERT embeddings for matching. Even trained embedding on MSMARCO do not lead to good performance and seems to be confused between correct and incorrect matches."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02a1429b-66c8-46de-8e46-11bd18fa91c5",
   "metadata": {},
   "source": [
    "### Matching: Case Investigation\n",
    "\n",
    "Checking the POIs in description with respect to OSM POIs\n",
    "\n",
    "**Aim**: Manually checking few examples in the dataset to see how the descriptions provided by people is different from tags stored in OSM to design a better approach for labelling the dataset.\n",
    "\n",
    "**Approach**: Given a case_id (walk), we pull all the POI information in WalkingMap dataset and extracted POIs from OSM in previous step, and we analysis spatial and semantic criteria and their success/failure in performing matching process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "4b42e4fb-b948-4b36-892b-caa24d53bede",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_case(idx):\n",
    "    record = dataset[idx]  \n",
    "    pois = {'title': [], 'summary': [], 'lat': [], 'lng': []}\n",
    "    for poi in record['pois']:\n",
    "        pois['title'].append(poi['title'])\n",
    "        pois['summary'].append(poi['summary'])\n",
    "        pois['lat'].append(poi['lat'])\n",
    "        pois['lng'].append(poi['lng'])\n",
    "        \n",
    "    df = pd.DataFrame(pois)\n",
    "    gdf = gpd.GeoDataFrame(df[['title', 'summary', 'lat', 'lng']],\n",
    "                           geometry=gpd.points_from_xy(df.lng, df.lat), crs=\"EPSG:4326\")\n",
    "    \n",
    "    if os.path.isfile('dataset/features-osm-{}.geojson'.format(idx)):\n",
    "        osm_pois = gpd.read_file('dataset/features-osm-{}.geojson'.format(idx))\n",
    "    else:\n",
    "        logger.warning('OSM features are not loaded - potentially empty dataframe')\n",
    "        osm_pois = None\n",
    "    return gdf, osm_pois"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "c2385e2e-8082-40a0-9d97-11d63bc96c36",
   "metadata": {},
   "outputs": [],
   "source": [
    "lw_poi, osm_poi = get_case(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "106d4936-f154-4224-9d33-8342f3c4f554",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>summary</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Fairhaven Surf Life Saving Club</td>\n",
       "      <td>Fairhaven is a well known surf beach. The beac...</td>\n",
       "      <td>-38.468759</td>\n",
       "      <td>144.084459</td>\n",
       "      <td>POINT (144.08446 -38.46876)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beach walk</td>\n",
       "      <td>From Sprout Creek, Eastern View, Moggs Creek, ...</td>\n",
       "      <td>-38.468542</td>\n",
       "      <td>144.089693</td>\n",
       "      <td>POINT (144.08969 -38.46854)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Rock pools</td>\n",
       "      <td>See what sort of shells and stones you can col...</td>\n",
       "      <td>-38.468459</td>\n",
       "      <td>144.092420</td>\n",
       "      <td>POINT (144.09242 -38.46846)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sand dunes</td>\n",
       "      <td>The beautiful rolling sand dunes shape the bea...</td>\n",
       "      <td>-38.468418</td>\n",
       "      <td>144.095318</td>\n",
       "      <td>POINT (144.09532 -38.46842)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Painkalac Creek</td>\n",
       "      <td>The creek separates Aireys Inlet from Fairhave...</td>\n",
       "      <td>-38.468390</td>\n",
       "      <td>144.097312</td>\n",
       "      <td>POINT (144.09731 -38.46839)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Rocks and caves under the light house</td>\n",
       "      <td>There are more rockpools and rocky outcrops to...</td>\n",
       "      <td>-38.468822</td>\n",
       "      <td>144.100861</td>\n",
       "      <td>POINT (144.10086 -38.46882)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Aireys Inlet playground and picnic ground</td>\n",
       "      <td>There is small skateboard ramp for children to...</td>\n",
       "      <td>-38.466199</td>\n",
       "      <td>144.098772</td>\n",
       "      <td>POINT (144.09877 -38.46620)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Aireys Inlet lower shops</td>\n",
       "      <td>Pick up a coffee, newspaper or Fish and Chips!...</td>\n",
       "      <td>-38.465536</td>\n",
       "      <td>144.098801</td>\n",
       "      <td>POINT (144.09880 -38.46554)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Loutit Bay lookout</td>\n",
       "      <td>Return to Painkalac Creek inlet and walk to th...</td>\n",
       "      <td>-38.467916</td>\n",
       "      <td>144.103435</td>\n",
       "      <td>POINT (144.10344 -38.46792)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Historical homestead and building</td>\n",
       "      <td>At the lighthouse is the original homestead fo...</td>\n",
       "      <td>-38.468048</td>\n",
       "      <td>144.103832</td>\n",
       "      <td>POINT (144.10383 -38.46805)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Split Point Lighthouse</td>\n",
       "      <td>Follow the signs and continue up the walking t...</td>\n",
       "      <td>-38.468114</td>\n",
       "      <td>144.104029</td>\n",
       "      <td>POINT (144.10403 -38.46811)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        title  \\\n",
       "0             Fairhaven Surf Life Saving Club   \n",
       "1                                  Beach walk   \n",
       "2                                  Rock pools   \n",
       "3                                  Sand dunes   \n",
       "4                             Painkalac Creek   \n",
       "5       Rocks and caves under the light house   \n",
       "6   Aireys Inlet playground and picnic ground   \n",
       "7                    Aireys Inlet lower shops   \n",
       "8                          Loutit Bay lookout   \n",
       "9           Historical homestead and building   \n",
       "10                     Split Point Lighthouse   \n",
       "\n",
       "                                              summary        lat         lng  \\\n",
       "0   Fairhaven is a well known surf beach. The beac... -38.468759  144.084459   \n",
       "1   From Sprout Creek, Eastern View, Moggs Creek, ... -38.468542  144.089693   \n",
       "2   See what sort of shells and stones you can col... -38.468459  144.092420   \n",
       "3   The beautiful rolling sand dunes shape the bea... -38.468418  144.095318   \n",
       "4   The creek separates Aireys Inlet from Fairhave... -38.468390  144.097312   \n",
       "5   There are more rockpools and rocky outcrops to... -38.468822  144.100861   \n",
       "6   There is small skateboard ramp for children to... -38.466199  144.098772   \n",
       "7   Pick up a coffee, newspaper or Fish and Chips!... -38.465536  144.098801   \n",
       "8   Return to Painkalac Creek inlet and walk to th... -38.467916  144.103435   \n",
       "9   At the lighthouse is the original homestead fo... -38.468048  144.103832   \n",
       "10  Follow the signs and continue up the walking t... -38.468114  144.104029   \n",
       "\n",
       "                       geometry  \n",
       "0   POINT (144.08446 -38.46876)  \n",
       "1   POINT (144.08969 -38.46854)  \n",
       "2   POINT (144.09242 -38.46846)  \n",
       "3   POINT (144.09532 -38.46842)  \n",
       "4   POINT (144.09731 -38.46839)  \n",
       "5   POINT (144.10086 -38.46882)  \n",
       "6   POINT (144.09877 -38.46620)  \n",
       "7   POINT (144.09880 -38.46554)  \n",
       "8   POINT (144.10344 -38.46792)  \n",
       "9   POINT (144.10383 -38.46805)  \n",
       "10  POINT (144.10403 -38.46811)  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lw_poi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "0cb89911-0709-4ba4-975e-fa5d3fe5b57e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>node</td>\n",
       "      <td>831201200</td>\n",
       "      <td>nan</td>\n",
       "      <td>toilets</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.09837 -38.46594)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>node</td>\n",
       "      <td>831201305</td>\n",
       "      <td>nan</td>\n",
       "      <td>toilets</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.10104 -38.46734)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>node</td>\n",
       "      <td>831201411</td>\n",
       "      <td>nan</td>\n",
       "      <td>bbq</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.09857 -38.46592)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>node</td>\n",
       "      <td>831201826</td>\n",
       "      <td>nan</td>\n",
       "      <td>shelter</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.09864 -38.46595)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>node</td>\n",
       "      <td>5315720235</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>picnic_table</td>\n",
       "      <td>POINT (144.10073 -38.46681)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>node</td>\n",
       "      <td>8568393481</td>\n",
       "      <td>nan</td>\n",
       "      <td>waste_basket</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (144.10029 -38.46666)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>way</td>\n",
       "      <td>30501938</td>\n",
       "      <td>Painkalac Creek Estuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>water</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.09591 -38.46359, 144.09625 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>way</td>\n",
       "      <td>69366065</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.10000 -38.46659, 144.09998 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>way</td>\n",
       "      <td>69366070</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.09836 -38.46586, 144.09836 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>way</td>\n",
       "      <td>69366078</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>playground</td>\n",
       "      <td>POLYGON ((144.10198 -38.46581, 144.10217 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>way</td>\n",
       "      <td>69366081</td>\n",
       "      <td>Aireys Inlet Skate Park</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>pitch</td>\n",
       "      <td>POLYGON ((144.09837 -38.46610, 144.09862 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>way</td>\n",
       "      <td>69366092</td>\n",
       "      <td>Bark Hut Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>park</td>\n",
       "      <td>POLYGON ((144.10218 -38.46611, 144.10185 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>way</td>\n",
       "      <td>69366108</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>playground</td>\n",
       "      <td>POLYGON ((144.09846 -38.46595, 144.09868 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>way</td>\n",
       "      <td>69366116</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.09880 -38.46545, 144.09894 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>way</td>\n",
       "      <td>69560073</td>\n",
       "      <td>Aireys Inlet Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>park</td>\n",
       "      <td>POLYGON ((144.09882 -38.46563, 144.09879 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>way</td>\n",
       "      <td>95186468</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>coastline</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (143.86352 -38.66725, 143.86349 -38...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>way</td>\n",
       "      <td>161748270</td>\n",
       "      <td>Allen Noble Sanctuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>wetland</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.10274 -38.46564, 144.10266 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>way</td>\n",
       "      <td>283542690</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.09879 -38.46604, 144.09885 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>way</td>\n",
       "      <td>865565686</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>swimming_pool</td>\n",
       "      <td>POLYGON ((144.08774 -38.46745, 144.08774 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>way</td>\n",
       "      <td>865569273</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>swimming_pool</td>\n",
       "      <td>POLYGON ((144.10330 -38.46597, 144.10337 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>way</td>\n",
       "      <td>1007494584</td>\n",
       "      <td>Fairhaven</td>\n",
       "      <td>nan</td>\n",
       "      <td>beach</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.06686 -38.46862, 144.06692 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>way</td>\n",
       "      <td>1009404376</td>\n",
       "      <td>Table Rock</td>\n",
       "      <td>nan</td>\n",
       "      <td>beach</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((144.10310 -38.46865, 144.10247 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>relation</td>\n",
       "      <td>9212148</td>\n",
       "      <td>Lorne - Queenscliff Coastal Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nature_reserve</td>\n",
       "      <td>MULTIPOLYGON (((143.94680 -38.57970, 143.94656...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>relation</td>\n",
       "      <td>9212157</td>\n",
       "      <td>Eagle Rock Marine Sanctuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nature_reserve</td>\n",
       "      <td>POLYGON ((144.10158 -38.46903, 144.10159 -38.4...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>relation</td>\n",
       "      <td>9457256</td>\n",
       "      <td>Bass Strait</td>\n",
       "      <td>nan</td>\n",
       "      <td>strait</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((143.51178 -38.85797, 143.51209 -38.8...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   element_type       osmid                                 name  \\\n",
       "0          node   831201200                                  nan   \n",
       "1          node   831201305                                  nan   \n",
       "2          node   831201411                                  nan   \n",
       "3          node   831201826                                  nan   \n",
       "4          node  5315720235                                  nan   \n",
       "5          node  8568393481                                  nan   \n",
       "6           way    30501938              Painkalac Creek Estuary   \n",
       "7           way    69366065                                  nan   \n",
       "8           way    69366070                                  nan   \n",
       "9           way    69366078                                  nan   \n",
       "10          way    69366081              Aireys Inlet Skate Park   \n",
       "11          way    69366092                     Bark Hut Reserve   \n",
       "12          way    69366108                                  nan   \n",
       "13          way    69366116                                  nan   \n",
       "14          way    69560073                 Aireys Inlet Reserve   \n",
       "15          way    95186468                                  nan   \n",
       "16          way   161748270                Allen Noble Sanctuary   \n",
       "17          way   283542690                                  nan   \n",
       "18          way   865565686                                  nan   \n",
       "19          way   865569273                                  nan   \n",
       "20          way  1007494584                            Fairhaven   \n",
       "21          way  1009404376                           Table Rock   \n",
       "22     relation     9212148  Lorne - Queenscliff Coastal Reserve   \n",
       "23     relation     9212157          Eagle Rock Marine Sanctuary   \n",
       "24     relation     9457256                          Bass Strait   \n",
       "\n",
       "         amenity    natural         leisure  \\\n",
       "0        toilets        nan             nan   \n",
       "1        toilets        nan             nan   \n",
       "2            bbq        nan             nan   \n",
       "3        shelter        nan             nan   \n",
       "4            nan        nan    picnic_table   \n",
       "5   waste_basket        nan             nan   \n",
       "6            nan      water             nan   \n",
       "7        parking        nan             nan   \n",
       "8        parking        nan             nan   \n",
       "9            nan        nan      playground   \n",
       "10           nan        nan           pitch   \n",
       "11           nan        nan            park   \n",
       "12           nan        nan      playground   \n",
       "13       parking        nan             nan   \n",
       "14           nan        nan            park   \n",
       "15           nan  coastline             nan   \n",
       "16           nan    wetland             nan   \n",
       "17       parking        nan             nan   \n",
       "18           nan        nan   swimming_pool   \n",
       "19           nan        nan   swimming_pool   \n",
       "20           nan      beach             nan   \n",
       "21           nan      beach             nan   \n",
       "22           nan        nan  nature_reserve   \n",
       "23           nan        nan  nature_reserve   \n",
       "24           nan     strait             nan   \n",
       "\n",
       "                                             geometry  \n",
       "0                         POINT (144.09837 -38.46594)  \n",
       "1                         POINT (144.10104 -38.46734)  \n",
       "2                         POINT (144.09857 -38.46592)  \n",
       "3                         POINT (144.09864 -38.46595)  \n",
       "4                         POINT (144.10073 -38.46681)  \n",
       "5                         POINT (144.10029 -38.46666)  \n",
       "6   POLYGON ((144.09591 -38.46359, 144.09625 -38.4...  \n",
       "7   POLYGON ((144.10000 -38.46659, 144.09998 -38.4...  \n",
       "8   POLYGON ((144.09836 -38.46586, 144.09836 -38.4...  \n",
       "9   POLYGON ((144.10198 -38.46581, 144.10217 -38.4...  \n",
       "10  POLYGON ((144.09837 -38.46610, 144.09862 -38.4...  \n",
       "11  POLYGON ((144.10218 -38.46611, 144.10185 -38.4...  \n",
       "12  POLYGON ((144.09846 -38.46595, 144.09868 -38.4...  \n",
       "13  POLYGON ((144.09880 -38.46545, 144.09894 -38.4...  \n",
       "14  POLYGON ((144.09882 -38.46563, 144.09879 -38.4...  \n",
       "15  LINESTRING (143.86352 -38.66725, 143.86349 -38...  \n",
       "16  POLYGON ((144.10274 -38.46564, 144.10266 -38.4...  \n",
       "17  POLYGON ((144.09879 -38.46604, 144.09885 -38.4...  \n",
       "18  POLYGON ((144.08774 -38.46745, 144.08774 -38.4...  \n",
       "19  POLYGON ((144.10330 -38.46597, 144.10337 -38.4...  \n",
       "20  POLYGON ((144.06686 -38.46862, 144.06692 -38.4...  \n",
       "21  POLYGON ((144.10310 -38.46865, 144.10247 -38.4...  \n",
       "22  MULTIPOLYGON (((143.94680 -38.57970, 143.94656...  \n",
       "23  POLYGON ((144.10158 -38.46903, 144.10159 -38.4...  \n",
       "24  POLYGON ((143.51178 -38.85797, 143.51209 -38.8...  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "f3728f8f-f58d-4646-a60b-a38e51bb8a75",
   "metadata": {},
   "outputs": [],
   "source": [
    "# projection \n",
    "\n",
    "lw_projected = lw_poi.to_crs(\"EPSG:32755\")\n",
    "osm_projected = osm_poi.to_crs(\"EPSG:32755\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "a9fd5474-0247-4791-9c32-c442df4cfb2f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title               Aireys Inlet playground and picnic ground\n",
       "summary     There is small skateboard ramp for children to...\n",
       "lat                                                -38.466199\n",
       "lng                                                144.098772\n",
       "geometry                    POINT (144.09877169 -38.46619881)\n",
       "Name: 6, dtype: object"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "poi_case = 6  # analysing a specific POI in the fetched case \n",
    "\n",
    "lw_poi.iloc[poi_case]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "ea15e395-d0d8-489e-9ede-f633466b1c43",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "      <th>distance_to_6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>way</td>\n",
       "      <td>283542690</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((246877.221 5738486.891, 246882.476 5...</td>\n",
       "      <td>0.902516</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>way</td>\n",
       "      <td>69560073</td>\n",
       "      <td>Aireys Inlet Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>park</td>\n",
       "      <td>POLYGON ((246878.250 5738531.576, 246875.373 5...</td>\n",
       "      <td>8.479294</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>way</td>\n",
       "      <td>69366081</td>\n",
       "      <td>Aireys Inlet Skate Park</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>pitch</td>\n",
       "      <td>POLYGON ((246840.654 5738478.916, 246862.497 5...</td>\n",
       "      <td>13.688306</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>way</td>\n",
       "      <td>69366108</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>playground</td>\n",
       "      <td>POLYGON ((246848.251 5738495.055, 246867.426 5...</td>\n",
       "      <td>15.267822</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>node</td>\n",
       "      <td>831201826</td>\n",
       "      <td>nan</td>\n",
       "      <td>shelter</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (246863.920 5738496.515)</td>\n",
       "      <td>30.304774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>node</td>\n",
       "      <td>831201411</td>\n",
       "      <td>nan</td>\n",
       "      <td>bbq</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (246857.373 5738499.131)</td>\n",
       "      <td>35.687710</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>way</td>\n",
       "      <td>69366070</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((246838.886 5738505.614, 246838.829 5...</td>\n",
       "      <td>37.170695</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>node</td>\n",
       "      <td>831201200</td>\n",
       "      <td>nan</td>\n",
       "      <td>toilets</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (246840.125 5738496.254)</td>\n",
       "      <td>45.318589</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>way</td>\n",
       "      <td>30501938</td>\n",
       "      <td>Painkalac Creek Estuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>water</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((246617.006 5738750.566, 246645.487 5...</td>\n",
       "      <td>51.867211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>way</td>\n",
       "      <td>69366116</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((246876.055 5738551.594, 246888.002 5...</td>\n",
       "      <td>78.705460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>relation</td>\n",
       "      <td>9212148</td>\n",
       "      <td>Lorne - Queenscliff Coastal Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nature_reserve</td>\n",
       "      <td>MULTIPOLYGON (((234032.450 5725441.094, 234011...</td>\n",
       "      <td>79.652279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>way</td>\n",
       "      <td>69366065</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((246984.925 5738428.679, 246983.636 5...</td>\n",
       "      <td>115.908134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>node</td>\n",
       "      <td>8568393481</td>\n",
       "      <td>nan</td>\n",
       "      <td>waste_basket</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (247010.317 5738421.768)</td>\n",
       "      <td>142.141188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>way</td>\n",
       "      <td>1009404376</td>\n",
       "      <td>Table Rock</td>\n",
       "      <td>nan</td>\n",
       "      <td>beach</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((247262.849 5738208.293, 247207.715 5...</td>\n",
       "      <td>148.460931</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>node</td>\n",
       "      <td>5315720235</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>picnic_table</td>\n",
       "      <td>POINT (247049.329 5738406.199)</td>\n",
       "      <td>184.116072</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>node</td>\n",
       "      <td>831201305</td>\n",
       "      <td>nan</td>\n",
       "      <td>toilets</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (247078.137 5738347.966)</td>\n",
       "      <td>235.344292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>way</td>\n",
       "      <td>1007494584</td>\n",
       "      <td>Fairhaven</td>\n",
       "      <td>nan</td>\n",
       "      <td>beach</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((244099.773 5738111.317, 244107.674 5...</td>\n",
       "      <td>242.844765</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>way</td>\n",
       "      <td>69366092</td>\n",
       "      <td>Bark Hut Reserve</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>park</td>\n",
       "      <td>POLYGON ((247173.589 5738487.885, 247143.808 5...</td>\n",
       "      <td>271.562240</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>relation</td>\n",
       "      <td>9457256</td>\n",
       "      <td>Bass Strait</td>\n",
       "      <td>nan</td>\n",
       "      <td>strait</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((197305.719 5693199.862, 197332.262 5...</td>\n",
       "      <td>277.569933</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>way</td>\n",
       "      <td>95186468</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>coastline</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (227108.604 5715478.392, 227106.040...</td>\n",
       "      <td>277.569933</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>way</td>\n",
       "      <td>161748270</td>\n",
       "      <td>Allen Noble Sanctuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>wetland</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((247220.495 5738541.191, 247213.006 5...</td>\n",
       "      <td>279.422113</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>way</td>\n",
       "      <td>69366078</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>playground</td>\n",
       "      <td>POLYGON ((247155.210 5738521.192, 247170.344 5...</td>\n",
       "      <td>283.902017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>relation</td>\n",
       "      <td>9212157</td>\n",
       "      <td>Eagle Rock Marine Sanctuary</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nature_reserve</td>\n",
       "      <td>POLYGON ((247131.297 5738162.720, 247131.845 5...</td>\n",
       "      <td>388.207314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>way</td>\n",
       "      <td>865569273</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>swimming_pool</td>\n",
       "      <td>POLYGON ((247270.470 5738506.467, 247276.519 5...</td>\n",
       "      <td>394.194319</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>way</td>\n",
       "      <td>865565686</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>swimming_pool</td>\n",
       "      <td>POLYGON ((245917.657 5738299.103, 245917.799 5...</td>\n",
       "      <td>968.728085</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   element_type       osmid                                 name  \\\n",
       "17          way   283542690                                  nan   \n",
       "14          way    69560073                 Aireys Inlet Reserve   \n",
       "10          way    69366081              Aireys Inlet Skate Park   \n",
       "12          way    69366108                                  nan   \n",
       "3          node   831201826                                  nan   \n",
       "2          node   831201411                                  nan   \n",
       "8           way    69366070                                  nan   \n",
       "0          node   831201200                                  nan   \n",
       "6           way    30501938              Painkalac Creek Estuary   \n",
       "13          way    69366116                                  nan   \n",
       "22     relation     9212148  Lorne - Queenscliff Coastal Reserve   \n",
       "7           way    69366065                                  nan   \n",
       "5          node  8568393481                                  nan   \n",
       "21          way  1009404376                           Table Rock   \n",
       "4          node  5315720235                                  nan   \n",
       "1          node   831201305                                  nan   \n",
       "20          way  1007494584                            Fairhaven   \n",
       "11          way    69366092                     Bark Hut Reserve   \n",
       "24     relation     9457256                          Bass Strait   \n",
       "15          way    95186468                                  nan   \n",
       "16          way   161748270                Allen Noble Sanctuary   \n",
       "9           way    69366078                                  nan   \n",
       "23     relation     9212157          Eagle Rock Marine Sanctuary   \n",
       "19          way   865569273                                  nan   \n",
       "18          way   865565686                                  nan   \n",
       "\n",
       "         amenity    natural         leisure  \\\n",
       "17       parking        nan             nan   \n",
       "14           nan        nan            park   \n",
       "10           nan        nan           pitch   \n",
       "12           nan        nan      playground   \n",
       "3        shelter        nan             nan   \n",
       "2            bbq        nan             nan   \n",
       "8        parking        nan             nan   \n",
       "0        toilets        nan             nan   \n",
       "6            nan      water             nan   \n",
       "13       parking        nan             nan   \n",
       "22           nan        nan  nature_reserve   \n",
       "7        parking        nan             nan   \n",
       "5   waste_basket        nan             nan   \n",
       "21           nan      beach             nan   \n",
       "4            nan        nan    picnic_table   \n",
       "1        toilets        nan             nan   \n",
       "20           nan      beach             nan   \n",
       "11           nan        nan            park   \n",
       "24           nan     strait             nan   \n",
       "15           nan  coastline             nan   \n",
       "16           nan    wetland             nan   \n",
       "9            nan        nan      playground   \n",
       "23           nan        nan  nature_reserve   \n",
       "19           nan        nan   swimming_pool   \n",
       "18           nan        nan   swimming_pool   \n",
       "\n",
       "                                             geometry  distance_to_6  \n",
       "17  POLYGON ((246877.221 5738486.891, 246882.476 5...       0.902516  \n",
       "14  POLYGON ((246878.250 5738531.576, 246875.373 5...       8.479294  \n",
       "10  POLYGON ((246840.654 5738478.916, 246862.497 5...      13.688306  \n",
       "12  POLYGON ((246848.251 5738495.055, 246867.426 5...      15.267822  \n",
       "3                      POINT (246863.920 5738496.515)      30.304774  \n",
       "2                      POINT (246857.373 5738499.131)      35.687710  \n",
       "8   POLYGON ((246838.886 5738505.614, 246838.829 5...      37.170695  \n",
       "0                      POINT (246840.125 5738496.254)      45.318589  \n",
       "6   POLYGON ((246617.006 5738750.566, 246645.487 5...      51.867211  \n",
       "13  POLYGON ((246876.055 5738551.594, 246888.002 5...      78.705460  \n",
       "22  MULTIPOLYGON (((234032.450 5725441.094, 234011...      79.652279  \n",
       "7   POLYGON ((246984.925 5738428.679, 246983.636 5...     115.908134  \n",
       "5                      POINT (247010.317 5738421.768)     142.141188  \n",
       "21  POLYGON ((247262.849 5738208.293, 247207.715 5...     148.460931  \n",
       "4                      POINT (247049.329 5738406.199)     184.116072  \n",
       "1                      POINT (247078.137 5738347.966)     235.344292  \n",
       "20  POLYGON ((244099.773 5738111.317, 244107.674 5...     242.844765  \n",
       "11  POLYGON ((247173.589 5738487.885, 247143.808 5...     271.562240  \n",
       "24  POLYGON ((197305.719 5693199.862, 197332.262 5...     277.569933  \n",
       "15  LINESTRING (227108.604 5715478.392, 227106.040...     277.569933  \n",
       "16  POLYGON ((247220.495 5738541.191, 247213.006 5...     279.422113  \n",
       "9   POLYGON ((247155.210 5738521.192, 247170.344 5...     283.902017  \n",
       "23  POLYGON ((247131.297 5738162.720, 247131.845 5...     388.207314  \n",
       "19  POLYGON ((247270.470 5738506.467, 247276.519 5...     394.194319  \n",
       "18  POLYGON ((245917.657 5738299.103, 245917.799 5...     968.728085  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_projected['distance_to_{}'.format(poi_case)] = osm_projected.distance(lw_projected.iloc[poi_case]['geometry'])\n",
    "osm_projected.sort_values(by='distance_to_{}'.format(poi_case))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b48d871",
   "metadata": {},
   "source": [
    "#### Conclusions:\n",
    "\n",
    "With manual investigation, we find out that spatial criterion can be a case to filter unwanted records but matching process needs more information than just matching by location because:\n",
    "\n",
    "1. POIs visible vs. POIs in nearby: Sometime people describe a place or object in nearby, sometimes the actual POI is far and the location in leisure walk is just a place to see that POI\n",
    "2. OSM and LW location errors\n",
    "3. No match: Not always we can match POIs in LW to OSM, as there are cases missing...\n",
    "4. Multiple matches: Not always a POI described in LW can be matched with only 1 OSM record - different conceptualization, possible ambiguity in description or OSM data -- e.g., a playground described in a park, but in OSM we have three different objects labelled as playground and all near to the location provided in LW."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "3cc8e38f-9bef-4ff6-b67e-c1021f7b0e28",
   "metadata": {},
   "outputs": [],
   "source": [
    "# todo - maybe creating a dataset as well! the task is actually difficult!\n",
    "def generate_req_id(row):\n",
    "    return row['element_type'][0].upper()+str(row['osmid'])\n",
    "\n",
    "osm_poi['req_id'] = osm_poi.apply(generate_req_id, axis=1)\n",
    "logger.debug(f'{osm_poi.req_id.tolist()}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "5cd6ef9d-4120-4775-a5f1-a145734596ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "req_ids = set()\n",
    "\n",
    "# read all osm ids and save in a file\n",
    "for idx, path in enumerate(paths):\n",
    "    if os.path.isfile('dataset/features-osm-{}.geojson'.format(idx)):\n",
    "        osm_poi = gpd.read_file('dataset/features-osm-{}.geojson'.format(idx))\n",
    "        osm_poi['req_id'] = osm_poi.apply(generate_req_id, axis=1)\n",
    "        req_ids.update(osm_poi.req_id.tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "9adb93e5-e1cb-4947-97d5-f1158a0561ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {\"Content-Type\": \"application/json; charset=utf-8\"}\n",
    "address_endpoint_template = \"https://nominatim.openstreetmap.org/lookup?osm_ids={}&format=json&extratags=1\"\n",
    "\n",
    "def download_osm_details(rids):\n",
    "    resp = requests.get(address_endpoint_template.format(','.join(rids), headers=headers))\n",
    "    return resp.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "0beba476-e5f9-448d-a043-939d5788e9b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "req_ids = list(req_ids)\n",
    "\n",
    "all_osm_info = []\n",
    "\n",
    "if os.path.isfile('dataset/osm-detailed-pois.json'):\n",
    "    with open('dataset/osm-detailed-pois.json', 'r') as fp:\n",
    "        all_osm_info = json.load(fp)\n",
    "else:\n",
    "    bucket_size = 50  # maximum value for OSM lookup!\n",
    "    for i in range(0, len(req_ids), bucket_size):\n",
    "        try:\n",
    "            all_osm_info.append(download_osm_details(req_ids[i:i+bucket_size]))\n",
    "            logger.info('bucket done: {}'.format(i))\n",
    "            time.sleep(0.5)\n",
    "        except Exception as e:\n",
    "            logger.warning(e)\n",
    "            logger.warning('error in bucket: {}'.format(i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "862fdc5b-cfa9-4405-b5d2-9c306c5df9ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_osm_list = []\n",
    "for bucket in all_osm_info:\n",
    "    all_osm_list.extend(bucket)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "805e3731-b62f-4056-8093-cf37ac1c3d40",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:27:16.738\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m7\u001b[0m - \u001b[1mDetailed information about OSM pois are stored in `dataset/osm-detailed-pois.json`\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "with open('dataset/osm-detailed-pois.json', 'w', encoding='utf-8') as fp:\n",
    "    json.dump(all_osm_info, fp)\n",
    "\n",
    "with open('dataset/processed-osm-detailed-pois.json', 'w', encoding='utf-8') as fp:\n",
    "    json.dump(all_osm_list, fp)\n",
    "    \n",
    "logger.info('Detailed information about OSM pois are stored in `dataset/osm-detailed-pois.json`')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "274dd2ec-9dd1-4f59-976a-d60c23e689f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "84203\n",
      "1685\n",
      "27900\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>place_id</th>\n",
       "      <th>licence</th>\n",
       "      <th>osm_type</th>\n",
       "      <th>osm_id</th>\n",
       "      <th>lat</th>\n",
       "      <th>lon</th>\n",
       "      <th>class</th>\n",
       "      <th>type</th>\n",
       "      <th>place_rank</th>\n",
       "      <th>importance</th>\n",
       "      <th>addresstype</th>\n",
       "      <th>name</th>\n",
       "      <th>display_name</th>\n",
       "      <th>address</th>\n",
       "      <th>extratags</th>\n",
       "      <th>boundingbox</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>50105769</td>\n",
       "      <td>Data © OpenStreetMap contributors, ODbL 1.0. h...</td>\n",
       "      <td>way</td>\n",
       "      <td>210529635</td>\n",
       "      <td>-37.77131895</td>\n",
       "      <td>144.88922947664923</td>\n",
       "      <td>amenity</td>\n",
       "      <td>parking</td>\n",
       "      <td>30</td>\n",
       "      <td>0.00001</td>\n",
       "      <td>amenity</td>\n",
       "      <td>David Jones Carpark</td>\n",
       "      <td>David Jones Carpark, Primary Place, Maribyrnon...</td>\n",
       "      <td>{'amenity': 'David Jones Carpark', 'road': 'Pr...</td>\n",
       "      <td>{'parking': 'multi-storey', 'building': 'parki...</td>\n",
       "      <td>[-37.7716973, -37.7708987, 144.8879962, 144.89...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50264145</td>\n",
       "      <td>Data © OpenStreetMap contributors, ODbL 1.0. h...</td>\n",
       "      <td>way</td>\n",
       "      <td>1005592702</td>\n",
       "      <td>-37.755842</td>\n",
       "      <td>144.79434671579116</td>\n",
       "      <td>amenity</td>\n",
       "      <td>parking</td>\n",
       "      <td>30</td>\n",
       "      <td>0.00001</td>\n",
       "      <td>amenity</td>\n",
       "      <td></td>\n",
       "      <td>Ken Jordan Road, Cairnlea, Melbourne, City of ...</td>\n",
       "      <td>{'road': 'Ken Jordan Road', 'suburb': 'Cairnle...</td>\n",
       "      <td>{'parking': 'street_side'}</td>\n",
       "      <td>[-37.7559597, -37.7557312, 144.7942967, 144.79...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>49748802</td>\n",
       "      <td>Data © OpenStreetMap contributors, ODbL 1.0. h...</td>\n",
       "      <td>way</td>\n",
       "      <td>948227337</td>\n",
       "      <td>-38.338870549999996</td>\n",
       "      <td>144.72523383795718</td>\n",
       "      <td>leisure</td>\n",
       "      <td>swimming_pool</td>\n",
       "      <td>30</td>\n",
       "      <td>0.00001</td>\n",
       "      <td>leisure</td>\n",
       "      <td></td>\n",
       "      <td>Stonecutters Road, Portsea, Melbourne, Shire o...</td>\n",
       "      <td>{'road': 'Stonecutters Road', 'suburb': 'Ports...</td>\n",
       "      <td>None</td>\n",
       "      <td>[-38.3389150, -38.3388240, 144.7251718, 144.72...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>50013438</td>\n",
       "      <td>Data © OpenStreetMap contributors, ODbL 1.0. h...</td>\n",
       "      <td>way</td>\n",
       "      <td>542417354</td>\n",
       "      <td>-37.985324399999996</td>\n",
       "      <td>145.2116291306154</td>\n",
       "      <td>amenity</td>\n",
       "      <td>parking</td>\n",
       "      <td>30</td>\n",
       "      <td>0.00001</td>\n",
       "      <td>amenity</td>\n",
       "      <td></td>\n",
       "      <td>Robinson Street, Dandenong, Melbourne, City of...</td>\n",
       "      <td>{'road': 'Robinson Street', 'suburb': 'Dandeno...</td>\n",
       "      <td>None</td>\n",
       "      <td>[-37.9856572, -37.9849960, 145.2114001, 145.21...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>50156137</td>\n",
       "      <td>Data © OpenStreetMap contributors, ODbL 1.0. h...</td>\n",
       "      <td>node</td>\n",
       "      <td>678349689</td>\n",
       "      <td>-37.800412</td>\n",
       "      <td>144.966749</td>\n",
       "      <td>amenity</td>\n",
       "      <td>restaurant</td>\n",
       "      <td>30</td>\n",
       "      <td>0.00001</td>\n",
       "      <td>amenity</td>\n",
       "      <td>Il Cantuccio</td>\n",
       "      <td>Il Cantuccio, 209, Lygon Street, Little Italy,...</td>\n",
       "      <td>{'amenity': 'Il Cantuccio', 'house_number': '2...</td>\n",
       "      <td>{'phone': '+61 3 9347 9959', 'cuisine': 'itali...</td>\n",
       "      <td>[-37.8004620, -37.8003620, 144.9666990, 144.96...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   place_id                                            licence osm_type  \\\n",
       "0  50105769  Data © OpenStreetMap contributors, ODbL 1.0. h...      way   \n",
       "1  50264145  Data © OpenStreetMap contributors, ODbL 1.0. h...      way   \n",
       "2  49748802  Data © OpenStreetMap contributors, ODbL 1.0. h...      way   \n",
       "3  50013438  Data © OpenStreetMap contributors, ODbL 1.0. h...      way   \n",
       "4  50156137  Data © OpenStreetMap contributors, ODbL 1.0. h...     node   \n",
       "\n",
       "       osm_id                  lat                 lon    class  \\\n",
       "0   210529635         -37.77131895  144.88922947664923  amenity   \n",
       "1  1005592702           -37.755842  144.79434671579116  amenity   \n",
       "2   948227337  -38.338870549999996  144.72523383795718  leisure   \n",
       "3   542417354  -37.985324399999996   145.2116291306154  amenity   \n",
       "4   678349689           -37.800412          144.966749  amenity   \n",
       "\n",
       "            type  place_rank  importance addresstype                 name  \\\n",
       "0        parking          30     0.00001     amenity  David Jones Carpark   \n",
       "1        parking          30     0.00001     amenity                        \n",
       "2  swimming_pool          30     0.00001     leisure                        \n",
       "3        parking          30     0.00001     amenity                        \n",
       "4     restaurant          30     0.00001     amenity         Il Cantuccio   \n",
       "\n",
       "                                        display_name  \\\n",
       "0  David Jones Carpark, Primary Place, Maribyrnon...   \n",
       "1  Ken Jordan Road, Cairnlea, Melbourne, City of ...   \n",
       "2  Stonecutters Road, Portsea, Melbourne, Shire o...   \n",
       "3  Robinson Street, Dandenong, Melbourne, City of...   \n",
       "4  Il Cantuccio, 209, Lygon Street, Little Italy,...   \n",
       "\n",
       "                                             address  \\\n",
       "0  {'amenity': 'David Jones Carpark', 'road': 'Pr...   \n",
       "1  {'road': 'Ken Jordan Road', 'suburb': 'Cairnle...   \n",
       "2  {'road': 'Stonecutters Road', 'suburb': 'Ports...   \n",
       "3  {'road': 'Robinson Street', 'suburb': 'Dandeno...   \n",
       "4  {'amenity': 'Il Cantuccio', 'house_number': '2...   \n",
       "\n",
       "                                           extratags  \\\n",
       "0  {'parking': 'multi-storey', 'building': 'parki...   \n",
       "1                         {'parking': 'street_side'}   \n",
       "2                                               None   \n",
       "3                                               None   \n",
       "4  {'phone': '+61 3 9347 9959', 'cuisine': 'itali...   \n",
       "\n",
       "                                         boundingbox  \n",
       "0  [-37.7716973, -37.7708987, 144.8879962, 144.89...  \n",
       "1  [-37.7559597, -37.7557312, 144.7942967, 144.79...  \n",
       "2  [-38.3389150, -38.3388240, 144.7251718, 144.72...  \n",
       "3  [-37.9856572, -37.9849960, 145.2114001, 145.21...  \n",
       "4  [-37.8004620, -37.8003620, 144.9666990, 144.96...  "
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(len(req_ids))\n",
    "print(len(all_osm_info))\n",
    "print(len(all_osm_list))\n",
    "osm_poi_details_df = pd.DataFrame(all_osm_list)\n",
    "osm_poi_details_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "7faf5def-ddfe-44ac-8ea2-0de476687d69",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Robinson Street, Dandenong, Melbourne, City of Greater Dandenong, Victoria, 3177, Australia'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi_details_df[(osm_poi_details_df['osm_type'] == 'way') & (osm_poi_details_df['osm_id'] == 542417354)]['display_name'].values[0]  # example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "66bd4a7b-0c6f-49cc-9897-ca7e59248c28",
   "metadata": {},
   "outputs": [],
   "source": [
    "def enrich(row):\n",
    "    info = osm_poi_details_df[(osm_poi_details_df['osm_type'] == row['element_type']) & \n",
    "    (osm_poi_details_df['osm_id'] == row['osmid'])]\n",
    "    if len(info) > 0:\n",
    "        t_name = ' '.join(info['display_name'].values[0].split(',')[:2])\n",
    "        if info['extratags'].values[0] is not None:\n",
    "            t_name += ' '.join([k+' '+v for (k,v) in info['extratags'].values[0].items()])\n",
    "    else:\n",
    "        t_name = ''\n",
    "    h_name = ''\n",
    "    parents = osm_poi.loc[(osm_poi.geometry.contains(row.geometry)) & (osm_poi.id != row.id)]['name'].values.tolist()\n",
    "    if len(parents) > 0:\n",
    "        h_name += ' in '+ ', '.join([p for p in parents if p != 'nan'])\n",
    "    p_name = ''\n",
    "    if row['amenity'] != 'nan':\n",
    "        p_name += 'amenity {} '.format(row['amenity']).replace('_', ' ')\n",
    "    if row['natural'] != 'nan':\n",
    "        p_name += 'natural {} '.format(row['natural']).replace('_', ' ')\n",
    "    if row['leisure'] != 'nan':\n",
    "        p_name += 'leisure {} '.format(row['leisure']).replace('_', ' ')\n",
    "    if row['name'] == 'nan':\n",
    "        return p_name+ t_name + h_name\n",
    "    return row['name'] + ' ' + p_name + t_name + h_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "dd3e3908-90b2-4d74-a799-12d0a9ab060f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "      <th>req_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194475</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05762 -37.65706)</td>\n",
       "      <td>N10889194475</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194476</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05774 -37.65697)</td>\n",
       "      <td>N10889194476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194477</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05782 -37.65706)</td>\n",
       "      <td>N10889194477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194478</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05787 -37.65710)</td>\n",
       "      <td>N10889194478</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194481</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05794 -37.65696)</td>\n",
       "      <td>N10889194481</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id element_type        osmid name amenity natural leisure  \\\n",
       "0   0         node  10889194475  nan     nan    tree     nan   \n",
       "1   1         node  10889194476  nan     nan    tree     nan   \n",
       "2   2         node  10889194477  nan     nan    tree     nan   \n",
       "3   3         node  10889194478  nan     nan    tree     nan   \n",
       "4   4         node  10889194481  nan     nan    tree     nan   \n",
       "\n",
       "                      geometry        req_id  \n",
       "0  POINT (145.05762 -37.65706)  N10889194475  \n",
       "1  POINT (145.05774 -37.65697)  N10889194476  \n",
       "2  POINT (145.05782 -37.65706)  N10889194477  \n",
       "3  POINT (145.05787 -37.65710)  N10889194478  \n",
       "4  POINT (145.05794 -37.65696)  N10889194481  "
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi.reset_index(inplace=True)\n",
    "osm_poi = osm_poi.rename(columns= {'index': 'id'})\n",
    "osm_poi.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "dae8579a-d858-430b-a7ae-b9afb1f279fb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "      <th>req_id</th>\n",
       "      <th>full_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194475</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05762 -37.65706)</td>\n",
       "      <td>N10889194475</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194476</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05774 -37.65697)</td>\n",
       "      <td>N10889194476</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194477</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05782 -37.65706)</td>\n",
       "      <td>N10889194477</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194478</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05787 -37.65710)</td>\n",
       "      <td>N10889194478</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194481</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05794 -37.65696)</td>\n",
       "      <td>N10889194481</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id element_type        osmid name amenity natural leisure  \\\n",
       "0   0         node  10889194475  nan     nan    tree     nan   \n",
       "1   1         node  10889194476  nan     nan    tree     nan   \n",
       "2   2         node  10889194477  nan     nan    tree     nan   \n",
       "3   3         node  10889194478  nan     nan    tree     nan   \n",
       "4   4         node  10889194481  nan     nan    tree     nan   \n",
       "\n",
       "                      geometry        req_id      full_name  \n",
       "0  POINT (145.05762 -37.65706)  N10889194475  natural tree   \n",
       "1  POINT (145.05774 -37.65697)  N10889194476  natural tree   \n",
       "2  POINT (145.05782 -37.65706)  N10889194477  natural tree   \n",
       "3  POINT (145.05787 -37.65710)  N10889194478  natural tree   \n",
       "4  POINT (145.05794 -37.65696)  N10889194481  natural tree   "
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi['full_name'] = osm_poi.apply(enrich, axis=1)\n",
    "osm_poi.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "6f2e64ad-9462-4560-a4c6-3f9af9d23c26",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Mill Park Recreation Reserve']"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi.loc[(osm_poi.geometry.contains(osm_poi.loc[70].geometry)) & (osm_poi.index != 70)]['name'].values.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "21d36f14-6328-4807-9bff-c51911681147",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'leisure pitch Lady Penrhyn Avenue  Mill Parksport softball in Mill Park Recreation Reserve'"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi.iloc[70]['full_name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "af4f8829-ca69-46f2-9c16-a0d5c2efb391",
   "metadata": {},
   "outputs": [],
   "source": [
    "for idx, path in enumerate(paths):\n",
    "    if os.path.isfile('dataset/features-osm-{}.geojson'.format(idx)):\n",
    "        if os.path.isfile(\"dataset/features-osm-poi-{}.geojson\".format(idx)):\n",
    "            continue\n",
    "        osm_poi = gpd.read_file('dataset/features-osm-{}.geojson'.format(idx))\n",
    "        logger.info('analysing: {0} - number of features: {1}'.format(idx, len(osm_poi)))\n",
    "        osm_poi.reset_index(inplace=True)\n",
    "        osm_poi = osm_poi.rename(columns= {'index': 'id'})\n",
    "        osm_poi['full_name'] = osm_poi.apply(enrich, axis=1)\n",
    "        osm_poi.to_file(\"dataset/features-osm-poi-{}.geojson\".format(idx), driver='GeoJSON')\n",
    "        logger.info('enriched features for path {0} out of {1} is loaded from OSM and saved ...'.format(idx, len(paths)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "1442008e-7b70-450e-9d62-3f93c2b2443e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>geometry</th>\n",
       "      <th>req_id</th>\n",
       "      <th>full_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194475</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05762 -37.65706)</td>\n",
       "      <td>N10889194475</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194476</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05774 -37.65697)</td>\n",
       "      <td>N10889194476</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194477</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05782 -37.65706)</td>\n",
       "      <td>N10889194477</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194478</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05787 -37.65710)</td>\n",
       "      <td>N10889194478</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>node</td>\n",
       "      <td>10889194481</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>tree</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (145.05794 -37.65696)</td>\n",
       "      <td>N10889194481</td>\n",
       "      <td>natural tree</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id element_type        osmid name amenity natural leisure  \\\n",
       "0   0         node  10889194475  nan     nan    tree     nan   \n",
       "1   1         node  10889194476  nan     nan    tree     nan   \n",
       "2   2         node  10889194477  nan     nan    tree     nan   \n",
       "3   3         node  10889194478  nan     nan    tree     nan   \n",
       "4   4         node  10889194481  nan     nan    tree     nan   \n",
       "\n",
       "                      geometry        req_id      full_name  \n",
       "0  POINT (145.05762 -37.65706)  N10889194475  natural tree   \n",
       "1  POINT (145.05774 -37.65697)  N10889194476  natural tree   \n",
       "2  POINT (145.05782 -37.65706)  N10889194477  natural tree   \n",
       "3  POINT (145.05787 -37.65710)  N10889194478  natural tree   \n",
       "4  POINT (145.05794 -37.65696)  N10889194481  natural tree   "
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "osm_poi.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ffc5081-5b25-4114-a858-35fc9445a816",
   "metadata": {},
   "source": [
    "### Matching LW POIs to OSM POIs: Experiment\n",
    "\n",
    "**Aim**: To investigate how to perform automatic matching of POIs in LW to OSM, or provide a set of candidate for a semi-automatic matching process (filtering: automatic, matching: manual)\n",
    "\n",
    "Using:\n",
    "\n",
    "- *Spatial criterion*: nearby or contained\n",
    "- *Thematic criterion*: topic representation of POI with types in OSM POIs\n",
    "- *Linguistic criterion*: description of the POI with detailed contextual information from OSM (name, type, hierarchy)\n",
    "\n",
    "**Note**: The POIs might be missing in OSM data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "729fe3a6-c306-4378-a9d8-ef3515161911",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_case_with_details(idx):\n",
    "    record = dataset[idx]  \n",
    "    pois = {'title': [], 'summary': [], 'lat': [], 'lng': []}\n",
    "    for poi in record['pois']:\n",
    "        pois['title'].append(poi['title'])\n",
    "        pois['summary'].append(poi['summary'])\n",
    "        pois['lat'].append(poi['lat'])\n",
    "        pois['lng'].append(poi['lng'])\n",
    "        \n",
    "    df = pd.DataFrame(pois)\n",
    "    gdf = gpd.GeoDataFrame(df[['title', 'summary', 'lat', 'lng']],\n",
    "                           geometry=gpd.points_from_xy(df.lng, df.lat), crs=\"EPSG:4326\")\n",
    "    \n",
    "    if os.path.isfile('dataset/features-osm-poi-{}.geojson'.format(idx)):\n",
    "        osm_pois = gpd.read_file('dataset/features-osm-poi-{}.geojson'.format(idx))\n",
    "    else:\n",
    "        logger.warning('OSM features are not loaded - potentially empty dataframe')\n",
    "        osm_pois = None\n",
    "    return gdf, osm_pois"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "ca6c40ab-499b-4217-9d53-d933467158a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_case_idx = 2\n",
    "test_case_gdf, test_case_pois = get_case_with_details(test_case_idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "8146912d-7d42-4021-9f94-16423f2ee1de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>summary</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1. Tramway signal box</td>\n",
       "      <td>Built in 1928 soon after the electrification o...</td>\n",
       "      <td>-37.806953</td>\n",
       "      <td>144.962813</td>\n",
       "      <td>POINT (320663.065 5813648.747)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2. City Baths</td>\n",
       "      <td>Built in 1903, the design reflected the social...</td>\n",
       "      <td>-37.807382</td>\n",
       "      <td>144.962990</td>\n",
       "      <td>POINT (320679.723 5813601.482)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3. Magistrates Court</td>\n",
       "      <td>Built on the site of the earlier Supreme Court...</td>\n",
       "      <td>-37.808828</td>\n",
       "      <td>144.966112</td>\n",
       "      <td>POINT (320958.059 5813447.052)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4. Old Melbourne Gaol</td>\n",
       "      <td>Built between 1851 - 1864. As the oldest survi...</td>\n",
       "      <td>-37.807569</td>\n",
       "      <td>144.965710</td>\n",
       "      <td>POINT (320919.660 5813585.973)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5. Eight Hour Day Monument</td>\n",
       "      <td>Built in 1923, the monument commemorates the E...</td>\n",
       "      <td>-37.807126</td>\n",
       "      <td>144.965808</td>\n",
       "      <td>POINT (320927.197 5813635.299)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6. Trades Hall</td>\n",
       "      <td>Built in stages from 1873 - 1926, Trades Hall ...</td>\n",
       "      <td>-37.806905</td>\n",
       "      <td>144.965989</td>\n",
       "      <td>POINT (320942.628 5813660.156)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7. Medley Hall</td>\n",
       "      <td>Built in 1893 as a private residence, the buil...</td>\n",
       "      <td>-37.805803</td>\n",
       "      <td>144.967618</td>\n",
       "      <td>POINT (321083.333 5813785.590)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8. Lygon shop corner</td>\n",
       "      <td>Lygon Buildings is architecturally significant...</td>\n",
       "      <td>-37.804863</td>\n",
       "      <td>144.966279</td>\n",
       "      <td>POINT (320963.170 5813887.275)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9. Matthais House</td>\n",
       "      <td>A two storeyed stucco faced bluestone house of...</td>\n",
       "      <td>-37.803827</td>\n",
       "      <td>144.967759</td>\n",
       "      <td>POINT (321091.019 5814005.140)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10. Sacred Heart Catholic Church</td>\n",
       "      <td>Built in 1855-56. In the 1930s and 1940s the C...</td>\n",
       "      <td>-37.803051</td>\n",
       "      <td>144.969378</td>\n",
       "      <td>POINT (321231.646 5814094.295)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11. IMAX Theatre</td>\n",
       "      <td>IMAX is the world's largest cinema format, wit...</td>\n",
       "      <td>-37.803371</td>\n",
       "      <td>144.969616</td>\n",
       "      <td>POINT (321253.423 5814059.312)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12. Melbourne Museum</td>\n",
       "      <td>The museum has eight galleries, including one ...</td>\n",
       "      <td>-37.803674</td>\n",
       "      <td>144.969564</td>\n",
       "      <td>POINT (321249.542 5814025.589)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13. Royal Exhibition Building</td>\n",
       "      <td>The Royal Exhibition Building is the only surv...</td>\n",
       "      <td>-37.805135</td>\n",
       "      <td>144.971235</td>\n",
       "      <td>POINT (321400.230 5813866.625)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14. Carlton Gardens</td>\n",
       "      <td>Developed in 19th-century Gardenesque style, t...</td>\n",
       "      <td>-37.805630</td>\n",
       "      <td>144.971385</td>\n",
       "      <td>POINT (321414.635 5813812.021)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15. Playground</td>\n",
       "      <td>Wonderful playground with swings, slides and m...</td>\n",
       "      <td>-37.802505</td>\n",
       "      <td>144.970700</td>\n",
       "      <td>POINT (321346.754 5814157.403)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16. Carlton Gardens Primary School</td>\n",
       "      <td>The school is an Italian Gothic style building...</td>\n",
       "      <td>-37.801938</td>\n",
       "      <td>144.969625</td>\n",
       "      <td>POINT (321250.714 5814218.329)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17. Terrace Houses (201 - 205 Drummond Street)</td>\n",
       "      <td>Built in 1884, Holcombe Terrace is a fine exam...</td>\n",
       "      <td>-37.800896</td>\n",
       "      <td>144.968340</td>\n",
       "      <td>POINT (321135.071 5814331.466)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18. Museo Italiano</td>\n",
       "      <td>Museo Italiano displays and interprets the exp...</td>\n",
       "      <td>-37.798813</td>\n",
       "      <td>144.968136</td>\n",
       "      <td>POINT (321112.101 5814562.205)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19. La Mama Theatre</td>\n",
       "      <td>Built in 1883, the building was used for vario...</td>\n",
       "      <td>-37.798753</td>\n",
       "      <td>144.967652</td>\n",
       "      <td>POINT (321069.304 5814567.889)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>19. Underground public toilets</td>\n",
       "      <td>The underground public toilets, built in 1939,...</td>\n",
       "      <td>-37.798724</td>\n",
       "      <td>144.967416</td>\n",
       "      <td>POINT (321048.512 5814570.650)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21. 313-315 Drummond Street</td>\n",
       "      <td>Built in 1889, the buliding is architecturally...</td>\n",
       "      <td>-37.798425</td>\n",
       "      <td>144.968750</td>\n",
       "      <td>POINT (321165.248 5814606.489)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>22. Carlton Court House</td>\n",
       "      <td>The Carlton Court House is historically signif...</td>\n",
       "      <td>-37.797703</td>\n",
       "      <td>144.968867</td>\n",
       "      <td>POINT (321173.746 5814686.735)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             title  \\\n",
       "0                            1. Tramway signal box   \n",
       "1                                    2. City Baths   \n",
       "2                             3. Magistrates Court   \n",
       "3                            4. Old Melbourne Gaol   \n",
       "4                       5. Eight Hour Day Monument   \n",
       "5                                   6. Trades Hall   \n",
       "6                                   7. Medley Hall   \n",
       "7                             8. Lygon shop corner   \n",
       "8                                9. Matthais House   \n",
       "9                 10. Sacred Heart Catholic Church   \n",
       "10                                11. IMAX Theatre   \n",
       "11                            12. Melbourne Museum   \n",
       "12                   13. Royal Exhibition Building   \n",
       "13                             14. Carlton Gardens   \n",
       "14                                  15. Playground   \n",
       "15              16. Carlton Gardens Primary School   \n",
       "16  17. Terrace Houses (201 - 205 Drummond Street)   \n",
       "17                              18. Museo Italiano   \n",
       "18                             19. La Mama Theatre   \n",
       "19                  19. Underground public toilets   \n",
       "20                     21. 313-315 Drummond Street   \n",
       "21                         22. Carlton Court House   \n",
       "\n",
       "                                              summary        lat         lng  \\\n",
       "0   Built in 1928 soon after the electrification o... -37.806953  144.962813   \n",
       "1   Built in 1903, the design reflected the social... -37.807382  144.962990   \n",
       "2   Built on the site of the earlier Supreme Court... -37.808828  144.966112   \n",
       "3   Built between 1851 - 1864. As the oldest survi... -37.807569  144.965710   \n",
       "4   Built in 1923, the monument commemorates the E... -37.807126  144.965808   \n",
       "5   Built in stages from 1873 - 1926, Trades Hall ... -37.806905  144.965989   \n",
       "6   Built in 1893 as a private residence, the buil... -37.805803  144.967618   \n",
       "7   Lygon Buildings is architecturally significant... -37.804863  144.966279   \n",
       "8   A two storeyed stucco faced bluestone house of... -37.803827  144.967759   \n",
       "9   Built in 1855-56. In the 1930s and 1940s the C... -37.803051  144.969378   \n",
       "10  IMAX is the world's largest cinema format, wit... -37.803371  144.969616   \n",
       "11  The museum has eight galleries, including one ... -37.803674  144.969564   \n",
       "12  The Royal Exhibition Building is the only surv... -37.805135  144.971235   \n",
       "13  Developed in 19th-century Gardenesque style, t... -37.805630  144.971385   \n",
       "14  Wonderful playground with swings, slides and m... -37.802505  144.970700   \n",
       "15  The school is an Italian Gothic style building... -37.801938  144.969625   \n",
       "16  Built in 1884, Holcombe Terrace is a fine exam... -37.800896  144.968340   \n",
       "17  Museo Italiano displays and interprets the exp... -37.798813  144.968136   \n",
       "18  Built in 1883, the building was used for vario... -37.798753  144.967652   \n",
       "19  The underground public toilets, built in 1939,... -37.798724  144.967416   \n",
       "20  Built in 1889, the buliding is architecturally... -37.798425  144.968750   \n",
       "21  The Carlton Court House is historically signif... -37.797703  144.968867   \n",
       "\n",
       "                          geometry  \n",
       "0   POINT (320663.065 5813648.747)  \n",
       "1   POINT (320679.723 5813601.482)  \n",
       "2   POINT (320958.059 5813447.052)  \n",
       "3   POINT (320919.660 5813585.973)  \n",
       "4   POINT (320927.197 5813635.299)  \n",
       "5   POINT (320942.628 5813660.156)  \n",
       "6   POINT (321083.333 5813785.590)  \n",
       "7   POINT (320963.170 5813887.275)  \n",
       "8   POINT (321091.019 5814005.140)  \n",
       "9   POINT (321231.646 5814094.295)  \n",
       "10  POINT (321253.423 5814059.312)  \n",
       "11  POINT (321249.542 5814025.589)  \n",
       "12  POINT (321400.230 5813866.625)  \n",
       "13  POINT (321414.635 5813812.021)  \n",
       "14  POINT (321346.754 5814157.403)  \n",
       "15  POINT (321250.714 5814218.329)  \n",
       "16  POINT (321135.071 5814331.466)  \n",
       "17  POINT (321112.101 5814562.205)  \n",
       "18  POINT (321069.304 5814567.889)  \n",
       "19  POINT (321048.512 5814570.650)  \n",
       "20  POINT (321165.248 5814606.489)  \n",
       "21  POINT (321173.746 5814686.735)  "
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_case_gdf = test_case_gdf.to_crs(\"EPSG:32755\")\n",
    "test_case_gdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "c0a0438d-0a63-4f3a-852a-8cf6903def45",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:27:34.150\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m2\u001b[0m - \u001b[1msize of the OSM POI dataframe: 2380\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>full_name</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>node</td>\n",
       "      <td>242538793</td>\n",
       "      <td>nan</td>\n",
       "      <td>post_box</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity post box Queensberry Street  Carlton</td>\n",
       "      <td>POINT (320727.351 5813899.378)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>node</td>\n",
       "      <td>242540159</td>\n",
       "      <td>nan</td>\n",
       "      <td>telephone</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity telephone Swanston Street  East End Th...</td>\n",
       "      <td>POINT (320749.536 5813412.933)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>node</td>\n",
       "      <td>242823091</td>\n",
       "      <td>nan</td>\n",
       "      <td>telephone</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity telephone Swanston Street  East End Th...</td>\n",
       "      <td>POINT (320730.759 5813422.172)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>node</td>\n",
       "      <td>242823102</td>\n",
       "      <td>nan</td>\n",
       "      <td>post_box</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity post box Pelham Street  Carlton</td>\n",
       "      <td>POINT (321220.734 5814072.447)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>node</td>\n",
       "      <td>242823114</td>\n",
       "      <td>nan</td>\n",
       "      <td>toilets</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity toilets Rathdowne Street  Carltonfee n...</td>\n",
       "      <td>POINT (321272.049 5814137.476)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id element_type      osmid name    amenity natural leisure  \\\n",
       "0   0         node  242538793  nan   post_box     nan     nan   \n",
       "1   1         node  242540159  nan  telephone     nan     nan   \n",
       "2   2         node  242823091  nan  telephone     nan     nan   \n",
       "3   3         node  242823102  nan   post_box     nan     nan   \n",
       "4   4         node  242823114  nan    toilets     nan     nan   \n",
       "\n",
       "                                           full_name  \\\n",
       "0       amenity post box Queensberry Street  Carlton   \n",
       "1  amenity telephone Swanston Street  East End Th...   \n",
       "2  amenity telephone Swanston Street  East End Th...   \n",
       "3            amenity post box Pelham Street  Carlton   \n",
       "4  amenity toilets Rathdowne Street  Carltonfee n...   \n",
       "\n",
       "                         geometry  \n",
       "0  POINT (320727.351 5813899.378)  \n",
       "1  POINT (320749.536 5813412.933)  \n",
       "2  POINT (320730.759 5813422.172)  \n",
       "3  POINT (321220.734 5814072.447)  \n",
       "4  POINT (321272.049 5814137.476)  "
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_case_pois = test_case_pois.to_crs(\"EPSG:32755\")\n",
    "logger.info(f'size of the OSM POI dataframe: {len(test_case_pois.id)}')\n",
    "test_case_pois.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "113eb771-1a43-489e-b454-7812d994bc84",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title                                          7. Medley Hall\n",
       "summary     Built in 1893 as a private residence, the buil...\n",
       "lat                                                -37.805803\n",
       "lng                                                144.967618\n",
       "geometry         POINT (321083.33274179127 5813785.590421503)\n",
       "Name: 6, dtype: object"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_row_id = 6\n",
    "test_row = test_case_gdf.loc[test_row_id]\n",
    "test_row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "28c1fbc9-b32d-4d99-973e-e18f546ddccd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>name</th>\n",
       "      <th>amenity</th>\n",
       "      <th>natural</th>\n",
       "      <th>leisure</th>\n",
       "      <th>full_name</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2153</th>\n",
       "      <td>2153</td>\n",
       "      <td>way</td>\n",
       "      <td>265141186</td>\n",
       "      <td>Lygon Street Christian Chapel</td>\n",
       "      <td>place_of_worship</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>Lygon Street Christian Chapel amenity place of...</td>\n",
       "      <td>POLYGON ((321008.928 5813761.985, 320994.084 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2231</th>\n",
       "      <td>2231</td>\n",
       "      <td>way</td>\n",
       "      <td>710777495</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking McDonald Lane  Carltonaccess p...</td>\n",
       "      <td>POLYGON ((321000.885 5813812.657, 320997.612 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2232</th>\n",
       "      <td>2232</td>\n",
       "      <td>way</td>\n",
       "      <td>710777496</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking Elm Tree Place  Carltonaccess ...</td>\n",
       "      <td>POLYGON ((321135.119 5813776.077, 321133.814 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2254</th>\n",
       "      <td>2254</td>\n",
       "      <td>way</td>\n",
       "      <td>743141724</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking McDonald Lane  Carltonaccess c...</td>\n",
       "      <td>POLYGON ((320978.251 5813859.025, 321001.671 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2283</th>\n",
       "      <td>2283</td>\n",
       "      <td>way</td>\n",
       "      <td>831017470</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking Hudson Place  Carltonaccess pr...</td>\n",
       "      <td>POLYGON ((321146.543 5813801.593, 321159.873 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2284</th>\n",
       "      <td>2284</td>\n",
       "      <td>way</td>\n",
       "      <td>831017484</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking Trades Hall Place  Carltonacce...</td>\n",
       "      <td>POLYGON ((321003.601 5813711.301, 321007.125 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2285</th>\n",
       "      <td>2285</td>\n",
       "      <td>way</td>\n",
       "      <td>831017485</td>\n",
       "      <td>nan</td>\n",
       "      <td>parking</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>amenity parking Trades Hall Place  Carltonacce...</td>\n",
       "      <td>POLYGON ((321009.308 5813760.340, 321010.362 5...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        id element_type      osmid                           name  \\\n",
       "2153  2153          way  265141186  Lygon Street Christian Chapel   \n",
       "2231  2231          way  710777495                            nan   \n",
       "2232  2232          way  710777496                            nan   \n",
       "2254  2254          way  743141724                            nan   \n",
       "2283  2283          way  831017470                            nan   \n",
       "2284  2284          way  831017484                            nan   \n",
       "2285  2285          way  831017485                            nan   \n",
       "\n",
       "               amenity natural leisure  \\\n",
       "2153  place_of_worship     nan     nan   \n",
       "2231           parking     nan     nan   \n",
       "2232           parking     nan     nan   \n",
       "2254           parking     nan     nan   \n",
       "2283           parking     nan     nan   \n",
       "2284           parking     nan     nan   \n",
       "2285           parking     nan     nan   \n",
       "\n",
       "                                              full_name  \\\n",
       "2153  Lygon Street Christian Chapel amenity place of...   \n",
       "2231  amenity parking McDonald Lane  Carltonaccess p...   \n",
       "2232  amenity parking Elm Tree Place  Carltonaccess ...   \n",
       "2254  amenity parking McDonald Lane  Carltonaccess c...   \n",
       "2283  amenity parking Hudson Place  Carltonaccess pr...   \n",
       "2284  amenity parking Trades Hall Place  Carltonacce...   \n",
       "2285  amenity parking Trades Hall Place  Carltonacce...   \n",
       "\n",
       "                                               geometry  \n",
       "2153  POLYGON ((321008.928 5813761.985, 320994.084 5...  \n",
       "2231  POLYGON ((321000.885 5813812.657, 320997.612 5...  \n",
       "2232  POLYGON ((321135.119 5813776.077, 321133.814 5...  \n",
       "2254  POLYGON ((320978.251 5813859.025, 321001.671 5...  \n",
       "2283  POLYGON ((321146.543 5813801.593, 321159.873 5...  \n",
       "2284  POLYGON ((321003.601 5813711.301, 321007.125 5...  \n",
       "2285  POLYGON ((321009.308 5813760.340, 321010.362 5...  "
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_case_pois.loc[test_row.geometry.distance(test_case_pois.geometry) < 100]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "de2ae00e-01ac-4817-b0fa-d444c7e0d07c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# geocoding results for all POIs\n",
    "titles = [pt for pt in data_structure['poi_title']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "17692d3f-9996-45b7-9939-be76e36ea511",
   "metadata": {},
   "outputs": [],
   "source": [
    "geocoder = Nominatim(user_agent='research_app')  # can geocoding be of help?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "142ea0cf-11fb-4be6-8964-95ecde2491ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'min_lat': -37.80922028,\n",
       " 'max_lat': -37.79740113,\n",
       " 'min_lng': 144.96275961,\n",
       " 'max_lng': 144.97345358}"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "paths[test_case_idx]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "c92d7a23-4e20-4690-95c0-742b23d1f851",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = geocoder.geocode(\"Medley Hall\", viewbox=[(paths[test_case_idx]['max_lat'], paths[test_case_idx]['max_lng']), \n",
    "                                                         (paths[test_case_idx]['min_lat'], paths[test_case_idx]['min_lng'])],\n",
    "                          bounded=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "072a2feb-4f68-47f0-a47f-0195a0ab81de",
   "metadata": {},
   "outputs": [],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "36421ae7-d9fe-472a-836a-211242ca336c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def geocode_by_name(name, path):\n",
    "    name = name.lstrip('0123456789.- ')\n",
    "    return geocoder.geocode(name, viewbox=[(path['max_lat'], path['max_lng']), \n",
    "                                                         (path['min_lat'], path['min_lng'])],\n",
    "                          bounded=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "f276fb96-102a-46c7-8e35-ce414ddb2cf6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:27:42.353\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m4\u001b[0m - \u001b[1mnominatim dump file is already loaded\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "if os.path.isfile('dataset/nominatim-geocoding.json'):\n",
    "    with open('dataset/nominatim-geocoding.json') as fp:\n",
    "        nominatim_output = json.load(fp)\n",
    "    logger.info('nominatim dump file is already loaded')\n",
    "else:\n",
    "    geocoding_results = []\n",
    "    for idx, path in enumerate(paths):\n",
    "        record = dataset[idx]\n",
    "        for poi in record['pois']:\n",
    "            name = poi['title']\n",
    "            result = geocode_by_name(name, path)\n",
    "            geocoding_results.append(result)\n",
    "            if result is not None:\n",
    "                logger.info(name, path)\n",
    "        if idx%10 == 0:\n",
    "            time.sleep(1)\n",
    "            logger.info('idx: {}'.format(idx))\n",
    "\n",
    "    nominatim_output = {}\n",
    "    counter = 0\n",
    "    for idx, path in enumerate(paths):\n",
    "        record = dataset[idx]\n",
    "        for poi in record['pois']:\n",
    "            geocoding_result = geocoding_results[counter]\n",
    "        \n",
    "            name = poi['title']\n",
    "            description = poi['summary']\n",
    "            lat = poi['lat']\n",
    "            lng = poi['lng']\n",
    "\n",
    "            nominatim_output[counter] = {'walk_id': idx, 'title': name, 'summary': description, 'lat': lat, 'lng': lng}\n",
    "            if geocoding_result is not None:\n",
    "                nominatim_output[counter]['osm'] = geocoding_result.raw\n",
    "            else:\n",
    "                nominatim_output[counter]['osm'] = None\n",
    "            counter += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "3c15f200-88ab-4f35-8a88-baad572d71d3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'walk_id': 2,\n",
       " 'title': '13. Royal Exhibition Building',\n",
       " 'summary': 'The Royal Exhibition Building is the only surviving Great Hall that once housed a 19th-century international exhibition and is still used for exhibitions. ',\n",
       " 'lat': -37.80513488,\n",
       " 'lng': 144.97123539,\n",
       " 'osm': {'place_id': 17546919,\n",
       "  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n",
       "  'osm_type': 'way',\n",
       "  'osm_id': 4817059,\n",
       "  'lat': '-37.804666850000004',\n",
       "  'lon': '144.9714669305319',\n",
       "  'class': 'historic',\n",
       "  'type': 'building',\n",
       "  'place_rank': 30,\n",
       "  'importance': 0.39044459367468287,\n",
       "  'addresstype': 'historic',\n",
       "  'name': 'Royal Exhibition Building',\n",
       "  'display_name': 'Royal Exhibition Building, 9, Nicholson Street, Carlton, Melbourne, City of Melbourne, Victoria, 3053, Australia',\n",
       "  'boundingbox': ['-37.8051500', '-37.8041865', '144.9705305', '144.9724671']}}"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nominatim_output['23']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "67af0065-557d-42cb-b2ca-b0cb8320dd5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:27:52.327\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m2\u001b[0m - \u001b[1msize of nominatim outputs: 4392\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "logger.debug(f'all nominatim outputs: {nominatim_output}')\n",
    "logger.info(f'size of nominatim outputs: {len(nominatim_output.keys())}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa078402-dca7-474c-a9f2-b428f67c0acd",
   "metadata": {},
   "source": [
    "## Labelling Process:\n",
    "\n",
    "- label if the geocoding was correct - if geocoding was successful\n",
    "- find 10 most likely using textual and spatial criteria and record the element as a match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "56a91445-049d-4e92-bd35-23eb19f191c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_feature_from_osm(lat, lng, dist=200, tags=tags):\n",
    "    return ox.features_from_point((lat, lng), tags, dist)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "466b1bd0-5033-4cb3-bbe7-6c2e4d829ade",
   "metadata": {},
   "outputs": [],
   "source": [
    "case_id = 23\n",
    "test_case = nominatim_output[str(case_id)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "29a2e528-c047-40bb-8fc2-10d167cc0e5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'walk_id': 2,\n",
       " 'title': '13. Royal Exhibition Building',\n",
       " 'summary': 'The Royal Exhibition Building is the only surviving Great Hall that once housed a 19th-century international exhibition and is still used for exhibitions. ',\n",
       " 'lat': -37.80513488,\n",
       " 'lng': 144.97123539,\n",
       " 'osm': {'place_id': 17546919,\n",
       "  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n",
       "  'osm_type': 'way',\n",
       "  'osm_id': 4817059,\n",
       "  'lat': '-37.804666850000004',\n",
       "  'lon': '144.9714669305319',\n",
       "  'class': 'historic',\n",
       "  'type': 'building',\n",
       "  'place_rank': 30,\n",
       "  'importance': 0.39044459367468287,\n",
       "  'addresstype': 'historic',\n",
       "  'name': 'Royal Exhibition Building',\n",
       "  'display_name': 'Royal Exhibition Building, 9, Nicholson Street, Carlton, Melbourne, City of Melbourne, Victoria, 3053, Australia',\n",
       "  'boundingbox': ['-37.8051500', '-37.8041865', '144.9705305', '144.9724671']}}"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "bcc455f1-6590-4fbb-bb59-9b8ecbe78331",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>access</th>\n",
       "      <th>amenity</th>\n",
       "      <th>changing_table</th>\n",
       "      <th>fee</th>\n",
       "      <th>operator</th>\n",
       "      <th>operator:wikidata</th>\n",
       "      <th>toilets:disposal</th>\n",
       "      <th>unisex</th>\n",
       "      <th>wheelchair</th>\n",
       "      <th>geometry</th>\n",
       "      <th>...</th>\n",
       "      <th>nodes</th>\n",
       "      <th>layer</th>\n",
       "      <th>leaf_type</th>\n",
       "      <th>ways</th>\n",
       "      <th>addr:suburb</th>\n",
       "      <th>type</th>\n",
       "      <th>wikipedia</th>\n",
       "      <th>intermittent</th>\n",
       "      <th>salt</th>\n",
       "      <th>water</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">node</th>\n",
       "      <th>319157917</th>\n",
       "      <td>yes</td>\n",
       "      <td>toilets</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>Q56477763</td>\n",
       "      <td>flush</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>POINT (144.96920 -37.80626)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>368393200</th>\n",
       "      <td>NaN</td>\n",
       "      <td>cinema</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Melbourne museum</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97064 -37.80351)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>371974432</th>\n",
       "      <td>NaN</td>\n",
       "      <td>fountain</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97138 -37.80545)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>767585574</th>\n",
       "      <td>NaN</td>\n",
       "      <td>bench</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.97296 -37.80630)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>767689503</th>\n",
       "      <td>NaN</td>\n",
       "      <td>parking_entrance</td>\n",
       "      <td>NaN</td>\n",
       "      <td>yes</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POINT (144.96975 -37.80406)</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">way</th>\n",
       "      <th>1249828891</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POLYGON ((144.97256 -37.80592, 144.97252 -37.8...</td>\n",
       "      <td>...</td>\n",
       "      <td>[11618112476, 11618112477, 11618112478, 116181...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1249828892</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POLYGON ((144.96983 -37.80600, 144.96980 -37.8...</td>\n",
       "      <td>...</td>\n",
       "      <td>[11618112487, 11618112488, 11618112489, 116181...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">relation</th>\n",
       "      <th>6614802</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>MULTIPOLYGON (((144.96913 -37.80719, 144.96903...</td>\n",
       "      <td>...</td>\n",
       "      <td>[[[4421481857, 4421481858, 4421481859, 4421481...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[444681638, 444681637]</td>\n",
       "      <td>Carlton</td>\n",
       "      <td>multipolygon</td>\n",
       "      <td>en:Carlton Gardens</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17205856</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POLYGON ((144.97226 -37.80608, 144.97222 -37.8...</td>\n",
       "      <td>...</td>\n",
       "      <td>[[[371970971, 371970972, 11618112514, 37197097...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[33005170, 1249828890, 1249828891]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>multipolygon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>pond</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17205857</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>POLYGON ((144.97004 -37.80614, 144.97007 -37.8...</td>\n",
       "      <td>...</td>\n",
       "      <td>[[[371971531, 371971532, 5864830649, 116181125...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[33005173, 1249828892]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>multipolygon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>pond</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>334 rows × 53 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                        access           amenity changing_table  fee  \\\n",
       "element_type osmid                                                     \n",
       "node         319157917     yes           toilets             no   no   \n",
       "             368393200     NaN            cinema            NaN  NaN   \n",
       "             371974432     NaN          fountain            NaN  NaN   \n",
       "             767585574     NaN             bench            NaN  NaN   \n",
       "             767689503     NaN  parking_entrance            NaN  yes   \n",
       "...                        ...               ...            ...  ...   \n",
       "way          1249828891    NaN               NaN            NaN  NaN   \n",
       "             1249828892    NaN               NaN            NaN  NaN   \n",
       "relation     6614802       NaN               NaN            NaN  NaN   \n",
       "             17205856      NaN               NaN            NaN  NaN   \n",
       "             17205857      NaN               NaN            NaN  NaN   \n",
       "\n",
       "                                       operator operator:wikidata  \\\n",
       "element_type osmid                                                  \n",
       "node         319157917   Melbourne City Council         Q56477763   \n",
       "             368393200         Melbourne museum               NaN   \n",
       "             371974432                      NaN               NaN   \n",
       "             767585574                      NaN               NaN   \n",
       "             767689503                      NaN               NaN   \n",
       "...                                         ...               ...   \n",
       "way          1249828891                     NaN               NaN   \n",
       "             1249828892                     NaN               NaN   \n",
       "relation     6614802     Melbourne City Council               NaN   \n",
       "             17205856                       NaN               NaN   \n",
       "             17205857                       NaN               NaN   \n",
       "\n",
       "                        toilets:disposal unisex wheelchair  \\\n",
       "element_type osmid                                           \n",
       "node         319157917             flush    yes        yes   \n",
       "             368393200               NaN    NaN        NaN   \n",
       "             371974432               NaN    NaN        NaN   \n",
       "             767585574               NaN    NaN        NaN   \n",
       "             767689503               NaN    NaN        NaN   \n",
       "...                                  ...    ...        ...   \n",
       "way          1249828891              NaN    NaN        NaN   \n",
       "             1249828892              NaN    NaN        NaN   \n",
       "relation     6614802                 NaN    NaN        NaN   \n",
       "             17205856                NaN    NaN        NaN   \n",
       "             17205857                NaN    NaN        NaN   \n",
       "\n",
       "                                                                  geometry  \\\n",
       "element_type osmid                                                           \n",
       "node         319157917                         POINT (144.96920 -37.80626)   \n",
       "             368393200                         POINT (144.97064 -37.80351)   \n",
       "             371974432                         POINT (144.97138 -37.80545)   \n",
       "             767585574                         POINT (144.97296 -37.80630)   \n",
       "             767689503                         POINT (144.96975 -37.80406)   \n",
       "...                                                                    ...   \n",
       "way          1249828891  POLYGON ((144.97256 -37.80592, 144.97252 -37.8...   \n",
       "             1249828892  POLYGON ((144.96983 -37.80600, 144.96980 -37.8...   \n",
       "relation     6614802     MULTIPOLYGON (((144.96913 -37.80719, 144.96903...   \n",
       "             17205856    POLYGON ((144.97226 -37.80608, 144.97222 -37.8...   \n",
       "             17205857    POLYGON ((144.97004 -37.80614, 144.97007 -37.8...   \n",
       "\n",
       "                         ...  \\\n",
       "element_type osmid       ...   \n",
       "node         319157917   ...   \n",
       "             368393200   ...   \n",
       "             371974432   ...   \n",
       "             767585574   ...   \n",
       "             767689503   ...   \n",
       "...                      ...   \n",
       "way          1249828891  ...   \n",
       "             1249828892  ...   \n",
       "relation     6614802     ...   \n",
       "             17205856    ...   \n",
       "             17205857    ...   \n",
       "\n",
       "                                                                     nodes  \\\n",
       "element_type osmid                                                           \n",
       "node         319157917                                                 NaN   \n",
       "             368393200                                                 NaN   \n",
       "             371974432                                                 NaN   \n",
       "             767585574                                                 NaN   \n",
       "             767689503                                                 NaN   \n",
       "...                                                                    ...   \n",
       "way          1249828891  [11618112476, 11618112477, 11618112478, 116181...   \n",
       "             1249828892  [11618112487, 11618112488, 11618112489, 116181...   \n",
       "relation     6614802     [[[4421481857, 4421481858, 4421481859, 4421481...   \n",
       "             17205856    [[[371970971, 371970972, 11618112514, 37197097...   \n",
       "             17205857    [[[371971531, 371971532, 5864830649, 116181125...   \n",
       "\n",
       "                        layer leaf_type                                ways  \\\n",
       "element_type osmid                                                            \n",
       "node         319157917    NaN       NaN                                 NaN   \n",
       "             368393200    NaN       NaN                                 NaN   \n",
       "             371974432    NaN       NaN                                 NaN   \n",
       "             767585574    NaN       NaN                                 NaN   \n",
       "             767689503    NaN       NaN                                 NaN   \n",
       "...                       ...       ...                                 ...   \n",
       "way          1249828891   NaN       NaN                                 NaN   \n",
       "             1249828892   NaN       NaN                                 NaN   \n",
       "relation     6614802      NaN       NaN              [444681638, 444681637]   \n",
       "             17205856     NaN       NaN  [33005170, 1249828890, 1249828891]   \n",
       "             17205857     NaN       NaN              [33005173, 1249828892]   \n",
       "\n",
       "                        addr:suburb          type           wikipedia  \\\n",
       "element_type osmid                                                      \n",
       "node         319157917          NaN           NaN                 NaN   \n",
       "             368393200          NaN           NaN                 NaN   \n",
       "             371974432          NaN           NaN                 NaN   \n",
       "             767585574          NaN           NaN                 NaN   \n",
       "             767689503          NaN           NaN                 NaN   \n",
       "...                             ...           ...                 ...   \n",
       "way          1249828891         NaN           NaN                 NaN   \n",
       "             1249828892         NaN           NaN                 NaN   \n",
       "relation     6614802        Carlton  multipolygon  en:Carlton Gardens   \n",
       "             17205856           NaN  multipolygon                 NaN   \n",
       "             17205857           NaN  multipolygon                 NaN   \n",
       "\n",
       "                        intermittent salt water  \n",
       "element_type osmid                               \n",
       "node         319157917           NaN  NaN   NaN  \n",
       "             368393200           NaN  NaN   NaN  \n",
       "             371974432           NaN  NaN   NaN  \n",
       "             767585574           NaN  NaN   NaN  \n",
       "             767689503           NaN  NaN   NaN  \n",
       "...                              ...  ...   ...  \n",
       "way          1249828891          NaN  NaN   NaN  \n",
       "             1249828892          NaN  NaN   NaN  \n",
       "relation     6614802             NaN  NaN   NaN  \n",
       "             17205856             no   no  pond  \n",
       "             17205857            NaN  NaN  pond  \n",
       "\n",
       "[334 rows x 53 columns]"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dist_threshold = 200\n",
    "features = get_feature_from_osm(test_case['lat'], test_case['lng'], dist=dist_threshold)\n",
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "c7cc636a-be08-469c-b25c-a9a8abdb0a22",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:29:38.982\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m1\u001b[0m - \u001b[1mlist of features from OSM results: ['access', 'amenity', 'changing_table', 'fee', 'operator', 'operator:wikidata', 'toilets:disposal', 'unisex', 'wheelchair', 'geometry', 'check_date', 'name', 'payment:mastercard', 'payment:visa', 'phone', 'screen', 'website', 'wikidata', 'artist', 'covered', 'drinking_water', 'indoor', 'backrest', 'source', 'parking', 'bicycle_parking', 'fountain', 'leisure', 'brand', 'brand:wikidata', 'brand:wikipedia', 'operator:wikipedia', 'payment:cash', 'payment:credit_cards', 'toilets:wheelchair', 'natural', 'capacity', 'location', 'material', 'recycling:cans', 'recycling:glass_bottles', 'recycling:paper', 'recycling_type', 'nodes', 'layer', 'leaf_type', 'ways', 'addr:suburb', 'type', 'wikipedia', 'intermittent', 'salt', 'water']\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "logger.info(f'list of features from OSM results: {list(features.columns)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "cd321c0b-80e7-4f15-809e-66c80301f1c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "for key, value in nominatim_output.items():\n",
    "    logger.debug('key: {}'.format(key))\n",
    "    if os.path.isfile('dataset/osm-poi-{0}-dist-{1}-features.geojson'.format(key, dist_threshold)):\n",
    "        logger.debug('already investigated...')\n",
    "        continue;\n",
    "    try:\n",
    "        # only set for next round - first run dist=200 to collect initial results and the empty frames are completed by considering 1000 meters thresholds.\n",
    "        features = get_feature_from_osm(value['lat'], value['lng'], dist=1000)\n",
    "        cols = list(features.columns)\n",
    "        cols.remove('geometry')\n",
    "        features[cols] = features[cols].astype(str)\n",
    "        features.to_file('dataset/osm-poi-{0}-dist-{1}-features.geojson'.format(key, dist_threshold), driver=\"GeoJSON\") \n",
    "        logger.info('done')\n",
    "    except Exception as e:\n",
    "        logger.warning(e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "9760da21-4708-4d5c-9773-d76987cf3cdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyproj\n",
    "from shapely.geometry import Point\n",
    "from shapely.ops import transform\n",
    "\n",
    "wgs84_pt = Point(test_case['lng'], test_case['lat'])\n",
    "wgs84 = pyproj.CRS('EPSG:4326')\n",
    "utm = pyproj.CRS('EPSG:32755')\n",
    "\n",
    "project = pyproj.Transformer.from_crs(wgs84, utm, always_xy=True).transform\n",
    "\n",
    "utm_point = transform(project, wgs84_pt)\n",
    "\n",
    "features = gpd.read_file('dataset/osm-poi-{0}-dist-{1}-features.geojson'.format(case_id, dist_threshold))\n",
    "features = features.to_crs('EPSG:32755')\n",
    "\n",
    "features['distance'] = [utm_point.distance(geom) for geom in features.geometry]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "dd4b1f94-5039-4935-84e2-271defe4cdac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>highway</th>\n",
       "      <th>traffic_signals:direction</th>\n",
       "      <th>access</th>\n",
       "      <th>amenity</th>\n",
       "      <th>fee</th>\n",
       "      <th>operator</th>\n",
       "      <th>toilets:disposal</th>\n",
       "      <th>unisex</th>\n",
       "      <th>...</th>\n",
       "      <th>source:population</th>\n",
       "      <th>name:mk</th>\n",
       "      <th>short_name</th>\n",
       "      <th>political_division</th>\n",
       "      <th>heritage</th>\n",
       "      <th>heritage:operator</th>\n",
       "      <th>heritage:website</th>\n",
       "      <th>area</th>\n",
       "      <th>geometry</th>\n",
       "      <th>distance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>node</td>\n",
       "      <td>319157917</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>toilets</td>\n",
       "      <td>no</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>flush</td>\n",
       "      <td>yes</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321223.384 5813738.129)</td>\n",
       "      <td>218.599034</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>node</td>\n",
       "      <td>368393200</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>cinema</td>\n",
       "      <td>nan</td>\n",
       "      <td>Melbourne museum</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321343.567 5814046.274)</td>\n",
       "      <td>188.373535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>node</td>\n",
       "      <td>371974432</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>fountain</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321413.415 5813831.827)</td>\n",
       "      <td>37.211904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>node</td>\n",
       "      <td>493873180</td>\n",
       "      <td>crossing</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321256.686 5813883.765)</td>\n",
       "      <td>144.563983</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>node</td>\n",
       "      <td>501027469</td>\n",
       "      <td>crossing</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321600.961 5813827.093)</td>\n",
       "      <td>204.586235</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>435</th>\n",
       "      <td>relation</td>\n",
       "      <td>6614802</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>MULTIPOLYGON (((321219.805 5813634.926, 321210...</td>\n",
       "      <td>3.048837</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>436</th>\n",
       "      <td>relation</td>\n",
       "      <td>6623509</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>1</td>\n",
       "      <td>whc</td>\n",
       "      <td>http://whc.unesco.org/en/list/1131</td>\n",
       "      <td>nan</td>\n",
       "      <td>MULTIPOLYGON (((321415.382 5813963.959, 321416...</td>\n",
       "      <td>3.048837</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>437</th>\n",
       "      <td>relation</td>\n",
       "      <td>13238592</td>\n",
       "      <td>pedestrian</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>POLYGON ((321571.451 5813919.342, 321569.416 5...</td>\n",
       "      <td>96.331620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>438</th>\n",
       "      <td>relation</td>\n",
       "      <td>16464561</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321163.337 5813990.989, 321180.796 5...</td>\n",
       "      <td>209.794588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>439</th>\n",
       "      <td>relation</td>\n",
       "      <td>16505481</td>\n",
       "      <td>pedestrian</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321495.098 5813936.836, 321494.024 5...</td>\n",
       "      <td>71.028757</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>440 rows × 141 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    element_type      osmid     highway traffic_signals:direction access  \\\n",
       "0           node  319157917         nan                       nan    yes   \n",
       "1           node  368393200         nan                       nan    nan   \n",
       "2           node  371974432         nan                       nan    nan   \n",
       "3           node  493873180    crossing                       nan    nan   \n",
       "4           node  501027469    crossing                       nan    nan   \n",
       "..           ...        ...         ...                       ...    ...   \n",
       "435     relation    6614802         nan                       nan    nan   \n",
       "436     relation    6623509         nan                       nan    nan   \n",
       "437     relation   13238592  pedestrian                       nan    nan   \n",
       "438     relation   16464561         nan                       nan    nan   \n",
       "439     relation   16505481  pedestrian                       nan    nan   \n",
       "\n",
       "      amenity  fee                operator toilets:disposal unisex  ...  \\\n",
       "0     toilets   no  Melbourne City Council            flush    yes  ...   \n",
       "1      cinema  nan        Melbourne museum              nan    nan  ...   \n",
       "2    fountain  nan                     nan              nan    nan  ...   \n",
       "3         nan  nan                     nan              nan    nan  ...   \n",
       "4         nan  nan                     nan              nan    nan  ...   \n",
       "..        ...  ...                     ...              ...    ...  ...   \n",
       "435       nan  nan  Melbourne City Council              nan    nan  ...   \n",
       "436       nan  nan                     nan              nan    nan  ...   \n",
       "437       nan  nan                     nan              nan    nan  ...   \n",
       "438       nan  nan                     nan              nan    nan  ...   \n",
       "439       nan  nan                     nan              nan    nan  ...   \n",
       "\n",
       "    source:population name:mk short_name political_division heritage  \\\n",
       "0                 nan     nan        nan                nan      nan   \n",
       "1                 nan     nan        nan                nan      nan   \n",
       "2                 nan     nan        nan                nan      nan   \n",
       "3                 nan     nan        nan                nan      nan   \n",
       "4                 nan     nan        nan                nan      nan   \n",
       "..                ...     ...        ...                ...      ...   \n",
       "435               nan     nan        nan                nan      nan   \n",
       "436               nan     nan        nan                nan        1   \n",
       "437               nan     nan        nan                nan      nan   \n",
       "438               nan     nan        nan                nan      nan   \n",
       "439               nan     nan        nan                nan      nan   \n",
       "\n",
       "    heritage:operator                    heritage:website area  \\\n",
       "0                 nan                                 nan  nan   \n",
       "1                 nan                                 nan  nan   \n",
       "2                 nan                                 nan  nan   \n",
       "3                 nan                                 nan  nan   \n",
       "4                 nan                                 nan  nan   \n",
       "..                ...                                 ...  ...   \n",
       "435               nan                                 nan  nan   \n",
       "436               whc  http://whc.unesco.org/en/list/1131  nan   \n",
       "437               nan                                 nan  yes   \n",
       "438               nan                                 nan  nan   \n",
       "439               nan                                 nan  nan   \n",
       "\n",
       "                                              geometry    distance  \n",
       "0                       POINT (321223.384 5813738.129)  218.599034  \n",
       "1                       POINT (321343.567 5814046.274)  188.373535  \n",
       "2                       POINT (321413.415 5813831.827)   37.211904  \n",
       "3                       POINT (321256.686 5813883.765)  144.563983  \n",
       "4                       POINT (321600.961 5813827.093)  204.586235  \n",
       "..                                                 ...         ...  \n",
       "435  MULTIPOLYGON (((321219.805 5813634.926, 321210...    3.048837  \n",
       "436  MULTIPOLYGON (((321415.382 5813963.959, 321416...    3.048837  \n",
       "437  POLYGON ((321571.451 5813919.342, 321569.416 5...   96.331620  \n",
       "438  POLYGON ((321163.337 5813990.989, 321180.796 5...  209.794588  \n",
       "439  POLYGON ((321495.098 5813936.836, 321494.024 5...   71.028757  \n",
       "\n",
       "[440 rows x 141 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "e08fa7d3-ca15-4fbe-895d-b5871512078c",
   "metadata": {},
   "outputs": [],
   "source": [
    "not_consider = ['geometry', 'distance', 'element_type', 'osmid']\n",
    "consider_first = ['name', 'short_name']\n",
    "cols = list(features.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "c51cb704-da33-4d82-801c-de9383e25654",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_textual_descriptions(row, cols=cols):\n",
    "    full_name = ''\n",
    "    for c in consider_first:\n",
    "        if c in cols and row[c] != 'nan':\n",
    "            full_name += ' ' + row[c]\n",
    "    for c in cols:\n",
    "        if c not in consider_first and c not in not_consider and ':' not in c and row[c] != 'nan' and row[c] != 'no' and 'http' not in row[c] and '[' not in row[c] and 'wiki' not in c:\n",
    "            if row[c] == 'yes':\n",
    "                full_name += ' ' + c\n",
    "            else:\n",
    "                full_name += ' {0} {1}'.format(c, row[c])\n",
    "    return full_name.strip()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "35aec619-d054-4130-ae60-4d2ae0a70694",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>highway</th>\n",
       "      <th>traffic_signals:direction</th>\n",
       "      <th>access</th>\n",
       "      <th>amenity</th>\n",
       "      <th>fee</th>\n",
       "      <th>operator</th>\n",
       "      <th>toilets:disposal</th>\n",
       "      <th>unisex</th>\n",
       "      <th>...</th>\n",
       "      <th>name:mk</th>\n",
       "      <th>short_name</th>\n",
       "      <th>political_division</th>\n",
       "      <th>heritage</th>\n",
       "      <th>heritage:operator</th>\n",
       "      <th>heritage:website</th>\n",
       "      <th>area</th>\n",
       "      <th>geometry</th>\n",
       "      <th>distance</th>\n",
       "      <th>full_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>node</td>\n",
       "      <td>319157917</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>toilets</td>\n",
       "      <td>no</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>flush</td>\n",
       "      <td>yes</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321223.384 5813738.129)</td>\n",
       "      <td>218.599034</td>\n",
       "      <td>access amenity toilets operator Melbourne City...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>node</td>\n",
       "      <td>368393200</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>cinema</td>\n",
       "      <td>nan</td>\n",
       "      <td>Melbourne museum</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321343.567 5814046.274)</td>\n",
       "      <td>188.373535</td>\n",
       "      <td>IMAX Melbourne amenity cinema operator Melbour...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>node</td>\n",
       "      <td>371974432</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>fountain</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321413.415 5813831.827)</td>\n",
       "      <td>37.211904</td>\n",
       "      <td>amenity fountain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>node</td>\n",
       "      <td>493873180</td>\n",
       "      <td>crossing</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321256.686 5813883.765)</td>\n",
       "      <td>144.563983</td>\n",
       "      <td>highway crossing crossing zebra</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>node</td>\n",
       "      <td>501027469</td>\n",
       "      <td>crossing</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321600.961 5813827.093)</td>\n",
       "      <td>204.586235</td>\n",
       "      <td>highway crossing crossing traffic_signals tact...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>435</th>\n",
       "      <td>relation</td>\n",
       "      <td>6614802</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>Melbourne City Council</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>MULTIPOLYGON (((321219.805 5813634.926, 321210...</td>\n",
       "      <td>3.048837</td>\n",
       "      <td>Carlton Gardens operator Melbourne City Counci...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>436</th>\n",
       "      <td>relation</td>\n",
       "      <td>6623509</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>1</td>\n",
       "      <td>whc</td>\n",
       "      <td>http://whc.unesco.org/en/list/1131</td>\n",
       "      <td>nan</td>\n",
       "      <td>MULTIPOLYGON (((321415.382 5813963.959, 321416...</td>\n",
       "      <td>3.048837</td>\n",
       "      <td>Royal Exhibition Building and Carlton Gardens ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>437</th>\n",
       "      <td>relation</td>\n",
       "      <td>13238592</td>\n",
       "      <td>pedestrian</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>POLYGON ((321571.451 5813919.342, 321569.416 5...</td>\n",
       "      <td>96.331620</td>\n",
       "      <td>highway pedestrian type multipolygon area</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>438</th>\n",
       "      <td>relation</td>\n",
       "      <td>16464561</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321163.337 5813990.989, 321180.796 5...</td>\n",
       "      <td>209.794588</td>\n",
       "      <td>building type multipolygon</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>439</th>\n",
       "      <td>relation</td>\n",
       "      <td>16505481</td>\n",
       "      <td>pedestrian</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321495.098 5813936.836, 321494.024 5...</td>\n",
       "      <td>71.028757</td>\n",
       "      <td>highway pedestrian type multipolygon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>440 rows × 142 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    element_type      osmid     highway traffic_signals:direction access  \\\n",
       "0           node  319157917         nan                       nan    yes   \n",
       "1           node  368393200         nan                       nan    nan   \n",
       "2           node  371974432         nan                       nan    nan   \n",
       "3           node  493873180    crossing                       nan    nan   \n",
       "4           node  501027469    crossing                       nan    nan   \n",
       "..           ...        ...         ...                       ...    ...   \n",
       "435     relation    6614802         nan                       nan    nan   \n",
       "436     relation    6623509         nan                       nan    nan   \n",
       "437     relation   13238592  pedestrian                       nan    nan   \n",
       "438     relation   16464561         nan                       nan    nan   \n",
       "439     relation   16505481  pedestrian                       nan    nan   \n",
       "\n",
       "      amenity  fee                operator toilets:disposal unisex  ...  \\\n",
       "0     toilets   no  Melbourne City Council            flush    yes  ...   \n",
       "1      cinema  nan        Melbourne museum              nan    nan  ...   \n",
       "2    fountain  nan                     nan              nan    nan  ...   \n",
       "3         nan  nan                     nan              nan    nan  ...   \n",
       "4         nan  nan                     nan              nan    nan  ...   \n",
       "..        ...  ...                     ...              ...    ...  ...   \n",
       "435       nan  nan  Melbourne City Council              nan    nan  ...   \n",
       "436       nan  nan                     nan              nan    nan  ...   \n",
       "437       nan  nan                     nan              nan    nan  ...   \n",
       "438       nan  nan                     nan              nan    nan  ...   \n",
       "439       nan  nan                     nan              nan    nan  ...   \n",
       "\n",
       "    name:mk short_name political_division heritage heritage:operator  \\\n",
       "0       nan        nan                nan      nan               nan   \n",
       "1       nan        nan                nan      nan               nan   \n",
       "2       nan        nan                nan      nan               nan   \n",
       "3       nan        nan                nan      nan               nan   \n",
       "4       nan        nan                nan      nan               nan   \n",
       "..      ...        ...                ...      ...               ...   \n",
       "435     nan        nan                nan      nan               nan   \n",
       "436     nan        nan                nan        1               whc   \n",
       "437     nan        nan                nan      nan               nan   \n",
       "438     nan        nan                nan      nan               nan   \n",
       "439     nan        nan                nan      nan               nan   \n",
       "\n",
       "                       heritage:website area  \\\n",
       "0                                   nan  nan   \n",
       "1                                   nan  nan   \n",
       "2                                   nan  nan   \n",
       "3                                   nan  nan   \n",
       "4                                   nan  nan   \n",
       "..                                  ...  ...   \n",
       "435                                 nan  nan   \n",
       "436  http://whc.unesco.org/en/list/1131  nan   \n",
       "437                                 nan  yes   \n",
       "438                                 nan  nan   \n",
       "439                                 nan  nan   \n",
       "\n",
       "                                              geometry    distance  \\\n",
       "0                       POINT (321223.384 5813738.129)  218.599034   \n",
       "1                       POINT (321343.567 5814046.274)  188.373535   \n",
       "2                       POINT (321413.415 5813831.827)   37.211904   \n",
       "3                       POINT (321256.686 5813883.765)  144.563983   \n",
       "4                       POINT (321600.961 5813827.093)  204.586235   \n",
       "..                                                 ...         ...   \n",
       "435  MULTIPOLYGON (((321219.805 5813634.926, 321210...    3.048837   \n",
       "436  MULTIPOLYGON (((321415.382 5813963.959, 321416...    3.048837   \n",
       "437  POLYGON ((321571.451 5813919.342, 321569.416 5...   96.331620   \n",
       "438  POLYGON ((321163.337 5813990.989, 321180.796 5...  209.794588   \n",
       "439  POLYGON ((321495.098 5813936.836, 321494.024 5...   71.028757   \n",
       "\n",
       "                                             full_name  \n",
       "0    access amenity toilets operator Melbourne City...  \n",
       "1    IMAX Melbourne amenity cinema operator Melbour...  \n",
       "2                                     amenity fountain  \n",
       "3                      highway crossing crossing zebra  \n",
       "4    highway crossing crossing traffic_signals tact...  \n",
       "..                                                 ...  \n",
       "435  Carlton Gardens operator Melbourne City Counci...  \n",
       "436  Royal Exhibition Building and Carlton Gardens ...  \n",
       "437          highway pedestrian type multipolygon area  \n",
       "438                         building type multipolygon  \n",
       "439               highway pedestrian type multipolygon  \n",
       "\n",
       "[440 rows x 142 columns]"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features['full_name'] = features.apply(generate_textual_descriptions, axis=1)\n",
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "c0d093a3-6c12-4ec1-9d7c-106b1e2cc365",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_similarities_topk(query, sentences, sentence_embeddings, model=sbert_model, k=10, verbose=False):\n",
    "    query_vec = embed_texts(query)\n",
    "    scores = util.dot_score(query_vec, sentence_embeddings)[0].cpu().tolist()\n",
    "    doc_score_pairs = list(zip(sentences, scores))\n",
    "    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)\n",
    "    if verbose:\n",
    "        logger.info(f\"Query: {query}\")\n",
    "        for doc, score in doc_score_pairs:\n",
    "            logger.info(f'\\t{score}\\t{doc}')\n",
    "    return np.argsort(scores)[-k:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "929f4992-c1e4-437c-b352-f06f1efadcf1",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-01-25 20:30:03.818\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m7\u001b[0m - \u001b[1mQuery: 13. Royal Exhibition Building The Royal Exhibition Building is the only surviving Great Hall that once housed a 19th-century international exhibition and is still used for exhibitions. \u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.819\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t44.04185104370117\tRoyal Exhibition Building and Carlton Gardens tourism attraction type multipolygon heritage 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.819\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t40.16722869873047\tRoyal Exhibition Building source Vicmap Address historic building tourism attraction building height 20 layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.820\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t39.156288146972656\tRoyal Exhibition Building Opening historic memorial memorial plaque\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.820\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t36.94139099121094\tMelbourne Museum fee operator Museum Victoria wheelchair phone +61 3 8341 7777 source Vicmap Address tourism museum building layer 1 atm internet_access wlan opening_hours Mo-Su 10:00-17:00,09:00-17:00\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.820\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t36.464195251464844\taccess historic monument inscription To Victoria from one of her earliest colonists in pleasant remeberance 1840 - 88\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.821\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t35.37290954589844\tExhibition Building/Rathdowne Street highway bus_stop tactile_paving network PTV - Metropolitan Buses public_transport platform ref ID.345 bus route_ref 250;251;402\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.821\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t35.2883415222168\tamenity charging_station operator Museums Victoria capacity 2 location underground\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.821\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t34.927093505859375\tExhibition Building/Rathdowne Street highway bus_stop tactile_paving network PTV - Metropolitan Buses public_transport platform ref ID.655 bus route_ref 250;251;402\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.822\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t34.915157318115234\tIMAX Melbourne amenity cinema operator Melbourne museum check_date 2022-10-02 phone +61 3 9663 5454 screen 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.822\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t33.64553451538086\tartwork_type sculpture tourism artwork\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.822\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.76984786987305\tnatural tree_row\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.822\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.76984786987305\tnatural tree_row\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.823\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.76984786987305\tnatural tree_row\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.823\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.76984786987305\tnatural tree_row\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.823\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.76984786987305\tnatural tree_row\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.824\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.053016662597656\tnatural wood leaf_type broadleaved\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.824\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t32.053016662597656\tnatural wood leaf_type broadleaved\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.824\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.955171585083008\thistoric monument\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.824\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.633041381835938\taccess customers amenity charging_station fee operator Melbourne Museum capacity 2\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.825\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.5223388671875\tbuilding office\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.825\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.441062927246094\tMuseum Cafe amenity cafe wheelchair\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.825\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.18695640563965\tCarlton Gardens operator Melbourne City Council leisure park type multipolygon\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.826\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t31.17928695678711\tSouth Carlton Gardens natural park\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.826\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.938030242919922\tRathdowne Street highway tertiary source nearmap network AU:VIC:S ref 46 lit surface asphalt oneway lanes 2 maxspeed 40\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.826\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.938030242919922\tRathdowne Street highway tertiary source nearmap network AU:VIC:S ref 46 lit surface asphalt oneway lanes 2 maxspeed 40\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.827\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.938030242919922\tRathdowne Street highway tertiary source nearmap network AU:VIC:S ref 46 lit surface asphalt oneway lanes 2 maxspeed 40\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.827\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.827\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.828\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.828\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.829\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.829\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.829\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.830\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.830\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.830\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.830\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.831\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.831\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.832\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.832\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.832\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.833\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.833\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.833\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.834\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.834\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.835\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.835\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.836\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.836\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.836\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.837\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.837\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.838\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.838\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.838\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.839\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.839\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.840\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.840\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.841\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.841\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.841\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.842\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.842\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.842\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.843\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.843\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.843\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.844\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.844\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.844\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.845\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.845\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.845\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.846\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.846\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.847\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.847\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.848\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.848\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.848\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.849\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.849\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.849\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.850\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.850\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.851\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.851\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.851\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.851\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.852\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.852\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.853\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.853\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.854\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.854\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.854\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.855\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.855\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.855\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.856\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.856\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.856\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.857\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.858\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.858\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.859\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.859\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.860\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.860\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.861\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.862\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.864\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.865\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.865\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.866\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.866\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.867\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.867\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.868\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.868\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.869\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.870\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.870\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.870\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.871\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.871\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.872\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.872\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.872\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.873\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.873\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.873\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.874\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.874\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.874\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.875\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.875\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.876\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.876\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.877\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.877\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.877\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.878\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.878\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.878\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.879\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.879\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.880\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.880\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.880\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.881\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.881\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.881\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.882\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.882\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.883\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.884\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.885\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.885\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.885\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.886\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.886\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.886\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.887\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.887\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.888\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.888\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.888\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.889\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.889\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.889\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.890\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.890\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.891\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.891\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.891\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.892\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.892\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.893\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.894\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.894\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.895\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.895\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.896\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.897\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.897\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.898\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.898\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.899\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.899\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.900\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.900\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.901\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.901\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.902\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.902\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.902\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.903\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.903\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.904\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.904\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.904\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.904\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.905\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.905\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.906\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.906\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.907\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.907\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.907\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.908\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.908\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.908\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.909\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.909\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.909\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.910\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.910\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.910\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.911\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.911\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.911\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.912\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.912\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.913\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.913\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.913\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.914\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.914\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.914\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.914\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.915\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.915\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.915\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.916\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.916\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.917\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.917\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.917\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.918\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.918\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.918\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.918\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.919\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.919\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.919\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.920\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.920\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.921\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.921\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.921\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.921\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.922\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.922\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.922\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.923\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.923\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.924\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.924\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.924\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.925\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.926\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.926\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.927\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.927\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.92023468017578\tnatural tree\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.928\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.833450317382812\tamenity bench material stone\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.928\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.833450317382812\tamenity bench material stone\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.929\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.833450317382812\tamenity bench material stone\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.930\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.833450317382812\tamenity bench material stone\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.931\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.833450317382812\tamenity bench material stone\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.931\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.74757957458496\tnatural tree_row leaf_type broadleaved\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.932\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.013858795166016\tsource Vicmap Address building\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.932\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t30.013858795166016\tsource Vicmap Address building\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.933\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t29.334415435791016\tMelbourne District Melbourne boundary political type boundary political_division au_vic_la\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.934\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t29.300464630126953\tamenity fountain\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.934\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t29.300464630126953\tamenity fountain\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.934\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.909582138061523\tStop 11: Melbourne Museum operator Yarra Trams wheelchair tactile_paving railway platform network PTV - Metropolitan Trams public_transport platform ref 11 tram bench bin lit route_ref 86;96 shelter passenger_information_display\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.935\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.820085525512695\tamenity bicycle_parking\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.935\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.53158950805664\tamenity waste_basket check_date 2022-10-04\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.936\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.53158950805664\tamenity waste_basket check_date 2022-10-04\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.936\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.1014347076416\tamenity drinking_water\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.937\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.067947387695312\tamenity parking_entrance fee parking underground\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.937\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.067947387695312\tamenity parking_entrance fee parking underground\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.937\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.067947387695312\tamenity parking_entrance fee parking underground\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.938\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.938\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.938\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.939\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.939\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.940\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.06528091430664\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.940\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.065279006958008\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.940\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.065279006958008\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.941\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t28.065279006958008\tamenity bench backrest\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.941\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.69495391845703\tCity of Melbourne Melbourne phone +61 3 9658 9658 place municipality admin_level 6 boundary administrative population 149615 type boundary\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.941\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.68996810913086\taccess amenity toilets operator Melbourne City Council unisex wheelchair\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.942\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.616586685180664\tnatural water\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.942\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.616586685180664\tnatural water\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.942\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.52220344543457\tbuilding roof layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.943\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.52220344543457\tbuilding roof layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.943\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.52220344543457\tbuilding roof layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.943\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.431129455566406\tleisure park\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.944\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.381160736083984\tbuilding\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.944\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t27.381160736083984\tbuilding\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.944\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.945\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.945\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.945\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.946\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.946\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.946\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.947\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.947\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.947\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.948\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.948\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.948\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.949\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.949\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.949\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.950\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.950\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.950\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.951\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.951\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.951\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.952\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.952\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.952\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.953\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.968120574951172\tleisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.953\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.76032257080078\thighway unclassified noname surface asphalt service parking_aisle\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.954\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.21531105041504\tamenity bicycle_parking check_date 2023-06-29\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.954\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.037982940673828\tMuseum Playground access customers leisure playground\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.954\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.015113830566406\thighway unclassified bicycle designated noname surface asphalt oneway service parking_aisle\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.954\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t26.00775909423828\tCity of Yarra place municipality admin_level 6 boundary administrative population 98521 type boundary\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.955\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.917381286621094\tamenity drinking_water check_date 2023-06-29 fountain bubbler\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.955\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.8033447265625\tamenity drinking_water check_date 2022-10-04\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.956\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.745967864990234\tPrinces Street highway residential source Yahoo surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.956\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.710418701171875\tbuilding type multipolygon\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.956\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.593326568603516\tamenity bicycle_parking check_date 2023-04-24\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.957\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.582136154174805\tPrinces Street highway residential source Yahoo lit surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.957\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.249755859375\tsource Vicmap Address landuse residential\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.957\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.24359893798828\tForest Gallery covered source nearmap leisure garden\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.958\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t25.004913330078125\tsource Vicmap Address building apartments\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.958\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.986482620239258\thighway unclassified bicycle designated lit noname surface asphalt service parking_aisle\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.958\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.982078552246094\tRichmond District Richmond boundary political type boundary political_division au_vic_la\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.959\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.869037628173828\tFitzroy place suburb admin_level 9 boundary administrative population 10445 postal_code 3065 type boundary\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.959\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.468185424804688\tCarlton place suburb admin_level 9 boundary administrative population 19001 postal_code 3053 type boundary\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.960\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.38406753540039\tNicholson Street highway primary lit surface asphalt oneway lanes 2 maxspeed 60\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.960\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.38406753540039\tNicholson Street highway primary lit surface asphalt oneway lanes 2 maxspeed 60\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.961\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t24.239730834960938\tamenity bicycle_parking check_date 2023-04-24 covered bicycle_parking stands\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.961\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.834636688232422\tleisure garden layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.962\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.714813232421875\tCommonwealth Bank amenity atm operator Commonwealth Bank brand Commonwealth Bank\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.962\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.67729949951172\tNicholson Street highway primary lit surface asphalt oneway lanes 2 maxspeed 40\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.963\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.67729949951172\tNicholson Street highway primary lit surface asphalt oneway lanes 2 maxspeed 40\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.963\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.658435821533203\tamenity bench backrest source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.964\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.658435821533203\tamenity bench backrest source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.964\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.658432006835938\tamenity bench backrest source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.965\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.658432006835938\tamenity bench backrest source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.966\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.19542694091797\thighway tertiary_link surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.966\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.19542694091797\thighway tertiary_link surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.967\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t23.14141273498535\tTelstra amenity telephone operator Telstra\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.967\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t22.126949310302734\thighway service service driveway\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.968\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t22.01078224182129\thighway pedestrian noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.968\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t22.01078224182129\thighway pedestrian noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.969\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t22.01078224182129\thighway pedestrian noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.969\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t21.906124114990234\thighway cycleway bicycle designated foot designated source nearmap surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.970\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t21.757476806640625\thighway pedestrian noname\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.970\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t21.757476806640625\thighway pedestrian noname\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.971\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t21.512310028076172\tnatural water layer 2\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.971\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t21.101741790771484\tQueensberry Street highway tertiary source nearmap lit surface asphalt oneway lanes 2 maxspeed 50\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.972\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.849063873291016\thighway service surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.972\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.634830474853516\tbuilding apartments\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.973\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.617916107177734\thighway footway crossing marked bicycle designated surface asphalt footway crossing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.973\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.452051162719727\tsource nearmap natural water layer 1\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.973\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.400806427001953\thighway footway surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.974\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.364856719970703\tQueensberry Street highway tertiary source nearmap;Bing;nearmap lit surface asphalt oneway lanes 2 maxspeed 50\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.974\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.3331298828125\thighway pedestrian surface asphalt service laneway\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.975\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.310163497924805\thighway crossing crossing unmarked kerb flush\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.975\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.268640518188477\thighway service\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.975\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.246990203857422\thighway footway crossing marked surface concrete footway crossing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.976\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.976\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.977\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.977\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.978\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.978\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.979\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t20.071638107299805\tlanduse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.979\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.879281997680664\thighway crossing crossing traffic_signals tactile_paving button_operated\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.979\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.875017166137695\thighway service source nearmap surface asphalt oneway\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.979\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.8541202545166\thighway footway bicycle designated source nearmap surface asphalt footway sidewalk\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.980\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.672927856445312\thighway crossing crossing traffic_signals tactile_paving button_operated foot designated\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.980\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.6680965423584\thighway footway surface asphalt footway crossing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.981\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.6680965423584\thighway footway surface asphalt footway crossing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.981\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.5889892578125\tlayer 1 landuse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.981\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.5889892578125\tlayer 1 landuse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.982\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.5889892578125\tlayer 1 landuse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.982\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.5889892578125\tlayer 1 landuse grass\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.982\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.564184188842773\thighway footway source nearmap surface asphalt footway sidewalk\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.982\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.556673049926758\thighway traffic_signals\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.983\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.556673049926758\thighway traffic_signals\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.983\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.556673049926758\thighway traffic_signals\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.984\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.183940887451172\trailway tram service crossover electrified contact_line frequency 0 gauge 1435 voltage 600\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.984\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.03142547607422\tNicholson Street source nearmap railway tram oneway electrified contact_line frequency 0 gauge 1435 voltage 600\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.984\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.03142547607422\tNicholson Street source nearmap railway tram oneway electrified contact_line frequency 0 gauge 1435 voltage 600\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.985\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t19.02886199951172\thighway pedestrian lit\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.985\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.985\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.985\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.986\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.986\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.986\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.987\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.987\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.987\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.564170837402344\thighway pedestrian source nearmap noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.988\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.452163696289062\thighway pedestrian source nearmap lit noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.988\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.452163696289062\thighway pedestrian source nearmap lit noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.988\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.452163696289062\thighway pedestrian source nearmap lit noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.989\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.452163696289062\thighway pedestrian source nearmap lit noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.989\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.452163696289062\thighway pedestrian source nearmap lit noname surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.990\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t18.273330688476562\thighway traffic_signals traffic_signals pedestrian_crossing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.990\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.96446418762207\thighway service source nearmap oneway\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.990\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.96446418762207\thighway service source nearmap oneway\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.991\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.960041046142578\thighway pedestrian type multipolygon area\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.991\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.83045196533203\thighway pedestrian\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.991\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.83045196533203\thighway pedestrian\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.992\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.83045196533203\thighway pedestrian\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.992\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.83045196533203\thighway pedestrian\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.993\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.82858657836914\thighway pedestrian source nearmap noname\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.993\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.82858657836914\thighway pedestrian source nearmap noname\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.994\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.81389808654785\thighway footway source nearmap surface asphalt\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.994\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.35993003845215\thighway pedestrian type multipolygon\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.995\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t17.193988800048828\thighway crossing crossing zebra\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.995\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t16.92513656616211\thighway pedestrian source nearmap,bing\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.996\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t14.394341468811035\thighway service bicycle source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.996\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t14.092538833618164\thighway pedestrian source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.997\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t14.092538833618164\thighway pedestrian source nearmap\u001b[0m\n",
      "\u001b[32m2025-01-25 20:30:03.997\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36mcompute_similarities_topk\u001b[0m:\u001b[36m9\u001b[0m - \u001b[1m\t14.092538833618164\thighway pedestrian source nearmap\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 46,   1,  26,  27,  44,  49, 298,  43, 297, 436])"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_descriptions = list(features['full_name'])\n",
    "feature_embeddings = embed_texts(feature_descriptions, model=msmarco_model)\n",
    "case_description = nominatim_output[str(case_id)]['title'] + ' ' + nominatim_output[str(case_id)]['summary']\n",
    "k_similar = compute_similarities_topk(case_description, feature_descriptions, feature_embeddings, model=msmarco_model, verbose=True)\n",
    "k_similar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "id": "3bba735b-ba81-4e34-bcaf-7e13d82072c7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>highway</th>\n",
       "      <th>traffic_signals:direction</th>\n",
       "      <th>access</th>\n",
       "      <th>amenity</th>\n",
       "      <th>fee</th>\n",
       "      <th>operator</th>\n",
       "      <th>toilets:disposal</th>\n",
       "      <th>unisex</th>\n",
       "      <th>...</th>\n",
       "      <th>name:mk</th>\n",
       "      <th>short_name</th>\n",
       "      <th>political_division</th>\n",
       "      <th>heritage</th>\n",
       "      <th>heritage:operator</th>\n",
       "      <th>heritage:website</th>\n",
       "      <th>area</th>\n",
       "      <th>geometry</th>\n",
       "      <th>distance</th>\n",
       "      <th>full_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>node</td>\n",
       "      <td>9106562132</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321559.646 5813927.545)</td>\n",
       "      <td>170.659614</td>\n",
       "      <td>artwork_type sculpture tourism artwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>node</td>\n",
       "      <td>368393200</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>cinema</td>\n",
       "      <td>nan</td>\n",
       "      <td>Melbourne museum</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321343.567 5814046.274)</td>\n",
       "      <td>188.373535</td>\n",
       "      <td>IMAX Melbourne amenity cinema operator Melbour...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>node</td>\n",
       "      <td>4061250667</td>\n",
       "      <td>bus_stop</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321243.477 5813954.530)</td>\n",
       "      <td>179.719018</td>\n",
       "      <td>Exhibition Building/Rathdowne Street highway b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>node</td>\n",
       "      <td>4332324003</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>charging_station</td>\n",
       "      <td>nan</td>\n",
       "      <td>Museums Victoria</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321463.713 5813962.011)</td>\n",
       "      <td>114.580080</td>\n",
       "      <td>amenity charging_station operator Museums Vict...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>node</td>\n",
       "      <td>7248901076</td>\n",
       "      <td>bus_stop</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321211.629 5813902.092)</td>\n",
       "      <td>191.907516</td>\n",
       "      <td>Exhibition Building/Rathdowne Street highway b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>node</td>\n",
       "      <td>9307551791</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321592.650 5813874.406)</td>\n",
       "      <td>192.577146</td>\n",
       "      <td>access historic monument inscription To Victor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>298</th>\n",
       "      <td>way</td>\n",
       "      <td>4817074</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>yes</td>\n",
       "      <td>Museum Victoria</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321505.691 5814023.082, 321459.066 5...</td>\n",
       "      <td>159.043248</td>\n",
       "      <td>Melbourne Museum fee operator Museum Victoria ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>node</td>\n",
       "      <td>6810298878</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POINT (321503.096 5813888.150)</td>\n",
       "      <td>105.093724</td>\n",
       "      <td>Royal Exhibition Building Opening historic mem...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>297</th>\n",
       "      <td>way</td>\n",
       "      <td>4817059</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((321415.382 5813963.959, 321416.338 5...</td>\n",
       "      <td>11.700249</td>\n",
       "      <td>Royal Exhibition Building source Vicmap Addres...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>436</th>\n",
       "      <td>relation</td>\n",
       "      <td>6623509</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>1</td>\n",
       "      <td>whc</td>\n",
       "      <td>http://whc.unesco.org/en/list/1131</td>\n",
       "      <td>nan</td>\n",
       "      <td>MULTIPOLYGON (((321415.382 5813963.959, 321416...</td>\n",
       "      <td>3.048837</td>\n",
       "      <td>Royal Exhibition Building and Carlton Gardens ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 142 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    element_type       osmid   highway traffic_signals:direction access  \\\n",
       "46          node  9106562132       nan                       nan    nan   \n",
       "1           node   368393200       nan                       nan    nan   \n",
       "26          node  4061250667  bus_stop                       nan    nan   \n",
       "27          node  4332324003       nan                       nan    nan   \n",
       "44          node  7248901076  bus_stop                       nan    nan   \n",
       "49          node  9307551791       nan                       nan    yes   \n",
       "298          way     4817074       nan                       nan    nan   \n",
       "43          node  6810298878       nan                       nan    nan   \n",
       "297          way     4817059       nan                       nan    nan   \n",
       "436     relation     6623509       nan                       nan    nan   \n",
       "\n",
       "              amenity  fee          operator toilets:disposal unisex  ...  \\\n",
       "46                nan  nan               nan              nan    nan  ...   \n",
       "1              cinema  nan  Melbourne museum              nan    nan  ...   \n",
       "26                nan  nan               nan              nan    nan  ...   \n",
       "27   charging_station  nan  Museums Victoria              nan    nan  ...   \n",
       "44                nan  nan               nan              nan    nan  ...   \n",
       "49                nan  nan               nan              nan    nan  ...   \n",
       "298               nan  yes   Museum Victoria              nan    nan  ...   \n",
       "43                nan  nan               nan              nan    nan  ...   \n",
       "297               nan  nan               nan              nan    nan  ...   \n",
       "436               nan  nan               nan              nan    nan  ...   \n",
       "\n",
       "    name:mk short_name political_division heritage heritage:operator  \\\n",
       "46      nan        nan                nan      nan               nan   \n",
       "1       nan        nan                nan      nan               nan   \n",
       "26      nan        nan                nan      nan               nan   \n",
       "27      nan        nan                nan      nan               nan   \n",
       "44      nan        nan                nan      nan               nan   \n",
       "49      nan        nan                nan      nan               nan   \n",
       "298     nan        nan                nan      nan               nan   \n",
       "43      nan        nan                nan      nan               nan   \n",
       "297     nan        nan                nan      nan               nan   \n",
       "436     nan        nan                nan        1               whc   \n",
       "\n",
       "                       heritage:website area  \\\n",
       "46                                  nan  nan   \n",
       "1                                   nan  nan   \n",
       "26                                  nan  nan   \n",
       "27                                  nan  nan   \n",
       "44                                  nan  nan   \n",
       "49                                  nan  nan   \n",
       "298                                 nan  nan   \n",
       "43                                  nan  nan   \n",
       "297                                 nan  nan   \n",
       "436  http://whc.unesco.org/en/list/1131  nan   \n",
       "\n",
       "                                              geometry    distance  \\\n",
       "46                      POINT (321559.646 5813927.545)  170.659614   \n",
       "1                       POINT (321343.567 5814046.274)  188.373535   \n",
       "26                      POINT (321243.477 5813954.530)  179.719018   \n",
       "27                      POINT (321463.713 5813962.011)  114.580080   \n",
       "44                      POINT (321211.629 5813902.092)  191.907516   \n",
       "49                      POINT (321592.650 5813874.406)  192.577146   \n",
       "298  POLYGON ((321505.691 5814023.082, 321459.066 5...  159.043248   \n",
       "43                      POINT (321503.096 5813888.150)  105.093724   \n",
       "297  POLYGON ((321415.382 5813963.959, 321416.338 5...   11.700249   \n",
       "436  MULTIPOLYGON (((321415.382 5813963.959, 321416...    3.048837   \n",
       "\n",
       "                                             full_name  \n",
       "46              artwork_type sculpture tourism artwork  \n",
       "1    IMAX Melbourne amenity cinema operator Melbour...  \n",
       "26   Exhibition Building/Rathdowne Street highway b...  \n",
       "27   amenity charging_station operator Museums Vict...  \n",
       "44   Exhibition Building/Rathdowne Street highway b...  \n",
       "49   access historic monument inscription To Victor...  \n",
       "298  Melbourne Museum fee operator Museum Victoria ...  \n",
       "43   Royal Exhibition Building Opening historic mem...  \n",
       "297  Royal Exhibition Building source Vicmap Addres...  \n",
       "436  Royal Exhibition Building and Carlton Gardens ...  \n",
       "\n",
       "[10 rows x 142 columns]"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features.iloc[k_similar]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "6ffab872-6744-4d9f-b8f9-3bb5b485b524",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_top_k(case_id, k=10, model = msmarco_model, verbose=False):\n",
    "    test_case = nominatim_output[str(case_id)]\n",
    "    \n",
    "    features = gpd.read_file('dataset/osm-poi-{0}-dist-{1}-features.geojson'.format(case_id, dist_threshold))\n",
    "    cols = list(features.columns)\n",
    "    if verbose:\n",
    "        print(features.head())\n",
    "        logger.info(cols)\n",
    "    features['full_name'] = features.apply(lambda row: generate_textual_descriptions(row, cols), axis=1)\n",
    "    \n",
    "    feature_descriptions = list(features['full_name'])\n",
    "    feature_embeddings = embed_texts(feature_descriptions, model=model)\n",
    "    \n",
    "    case_description = test_case['title'] + ' ' + test_case['summary']\n",
    "    k_similar = compute_similarities_topk(case_description, feature_descriptions, feature_embeddings, model=model, verbose=verbose)\n",
    "    return features.iloc[np.flip(k_similar)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "id": "ff08c65b-6124-44bf-aff4-8595c3245369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4392"
      ]
     },
     "execution_count": 200,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(top_k_dfs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 201,
   "id": "d3125700-cd71-41c6-8163-384f479b56e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4391"
      ]
     },
     "execution_count": 201,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "case_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 202,
   "id": "058cf571-6e5c-4b02-beb4-25a3ec0377d3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>element_type</th>\n",
       "      <th>osmid</th>\n",
       "      <th>highway</th>\n",
       "      <th>source</th>\n",
       "      <th>surface</th>\n",
       "      <th>leisure</th>\n",
       "      <th>sport</th>\n",
       "      <th>nodes</th>\n",
       "      <th>name</th>\n",
       "      <th>bicycle</th>\n",
       "      <th>...</th>\n",
       "      <th>service</th>\n",
       "      <th>fee</th>\n",
       "      <th>footway</th>\n",
       "      <th>shelter_type</th>\n",
       "      <th>geometry</th>\n",
       "      <th>full_name</th>\n",
       "      <th>osm_id</th>\n",
       "      <th>osm_type</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>way</td>\n",
       "      <td>1105105836</td>\n",
       "      <td>footway</td>\n",
       "      <td>nan</td>\n",
       "      <td>paved</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 668646878, 10112835040, 10112835039, 1011283...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05682 -37.65699, 145.05665 -37...</td>\n",
       "      <td>highway footway surface paved</td>\n",
       "      <td>1105105836</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.657301</td>\n",
       "      <td>145.056369</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>way</td>\n",
       "      <td>1105106176</td>\n",
       "      <td>footway</td>\n",
       "      <td>nan</td>\n",
       "      <td>concrete</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 10112832233, 11098668126, 10112832241, 10112...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05658 -37.65664, 145.05668 -37...</td>\n",
       "      <td>highway footway surface concrete</td>\n",
       "      <td>1105106176</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.656814</td>\n",
       "      <td>145.056755</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>way</td>\n",
       "      <td>1105105835</td>\n",
       "      <td>service</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 10112835025, 10112835039, 10112835035 ]</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>parking_aisle</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05615 -37.65695, 145.05647 -37...</td>\n",
       "      <td>highway service service parking_aisle</td>\n",
       "      <td>1105105835</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.657204</td>\n",
       "      <td>145.056434</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>way</td>\n",
       "      <td>1105105834</td>\n",
       "      <td>service</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 10112835033, 10112835038, 10112835034 ]</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>parking_aisle</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05599 -37.65704, 145.05631 -37...</td>\n",
       "      <td>highway service service parking_aisle</td>\n",
       "      <td>1105105834</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.657308</td>\n",
       "      <td>145.056267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>way</td>\n",
       "      <td>1105105833</td>\n",
       "      <td>service</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 10112835024, 10112835028, 10112835047, 10112...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>parking_aisle</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05583 -37.65713, 145.05583 -37...</td>\n",
       "      <td>highway service service parking_aisle</td>\n",
       "      <td>1105105833</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.657328</td>\n",
       "      <td>145.056424</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>way</td>\n",
       "      <td>1105105832</td>\n",
       "      <td>service</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 10112835024, 978035608, 10112835033, 1011283...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>parking_aisle</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>LINESTRING (145.05583 -37.65713, 145.05595 -37...</td>\n",
       "      <td>highway service service parking_aisle</td>\n",
       "      <td>1105105832</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.656905</td>\n",
       "      <td>145.056340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>way</td>\n",
       "      <td>277564665</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>[ 2820691440, 10155904575, 10794945280, 107949...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((145.05939 -37.65564, 145.05933 -37.6...</td>\n",
       "      <td>natural wood</td>\n",
       "      <td>277564665</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.655313</td>\n",
       "      <td>145.058038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>way</td>\n",
       "      <td>1171940509</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>pitch</td>\n",
       "      <td>softball</td>\n",
       "      <td>[ 10889194401, 10889194402, 10889194403, 10889...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((145.05677 -37.65641, 145.05675 -37.6...</td>\n",
       "      <td>leisure pitch sport softball</td>\n",
       "      <td>1171940509</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.656364</td>\n",
       "      <td>145.056266</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>way</td>\n",
       "      <td>1171940508</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>pitch</td>\n",
       "      <td>softball</td>\n",
       "      <td>[ 10889194386, 10889194387, 10889194388, 10889...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((145.05563 -37.65674, 145.05557 -37.6...</td>\n",
       "      <td>leisure pitch sport softball</td>\n",
       "      <td>1171940508</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.656532</td>\n",
       "      <td>145.055199</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>way</td>\n",
       "      <td>1171940507</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>pitch</td>\n",
       "      <td>softball</td>\n",
       "      <td>[ 10889194371, 10889194372, 10889194373, 10889...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>...</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "      <td>POLYGON ((145.05555 -37.65548, 145.05560 -37.6...</td>\n",
       "      <td>leisure pitch sport softball</td>\n",
       "      <td>1171940507</td>\n",
       "      <td>way</td>\n",
       "      <td>-37.655700</td>\n",
       "      <td>145.055967</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 34 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   element_type       osmid  highway source   surface leisure     sport  \\\n",
       "51          way  1105105836  footway    nan     paved     nan       nan   \n",
       "53          way  1105106176  footway    nan  concrete     nan       nan   \n",
       "50          way  1105105835  service    nan       nan     nan       nan   \n",
       "49          way  1105105834  service    nan       nan     nan       nan   \n",
       "48          way  1105105833  service    nan       nan     nan       nan   \n",
       "47          way  1105105832  service    nan       nan     nan       nan   \n",
       "34          way   277564665      nan    nan       nan     nan       nan   \n",
       "85          way  1171940509      nan    nan       nan   pitch  softball   \n",
       "84          way  1171940508      nan    nan       nan   pitch  softball   \n",
       "83          way  1171940507      nan    nan       nan   pitch  softball   \n",
       "\n",
       "                                                nodes name bicycle  ...  \\\n",
       "51  [ 668646878, 10112835040, 10112835039, 1011283...  nan     nan  ...   \n",
       "53  [ 10112832233, 11098668126, 10112832241, 10112...  nan     nan  ...   \n",
       "50          [ 10112835025, 10112835039, 10112835035 ]  nan     nan  ...   \n",
       "49          [ 10112835033, 10112835038, 10112835034 ]  nan     nan  ...   \n",
       "48  [ 10112835024, 10112835028, 10112835047, 10112...  nan     nan  ...   \n",
       "47  [ 10112835024, 978035608, 10112835033, 1011283...  nan     nan  ...   \n",
       "34  [ 2820691440, 10155904575, 10794945280, 107949...  nan     nan  ...   \n",
       "85  [ 10889194401, 10889194402, 10889194403, 10889...  nan     nan  ...   \n",
       "84  [ 10889194386, 10889194387, 10889194388, 10889...  nan     nan  ...   \n",
       "83  [ 10889194371, 10889194372, 10889194373, 10889...  nan     nan  ...   \n",
       "\n",
       "          service  fee footway shelter_type  \\\n",
       "51            nan  nan     nan          nan   \n",
       "53            nan  nan     nan          nan   \n",
       "50  parking_aisle  nan     nan          nan   \n",
       "49  parking_aisle  nan     nan          nan   \n",
       "48  parking_aisle  nan     nan          nan   \n",
       "47  parking_aisle  nan     nan          nan   \n",
       "34            nan  nan     nan          nan   \n",
       "85            nan  nan     nan          nan   \n",
       "84            nan  nan     nan          nan   \n",
       "83            nan  nan     nan          nan   \n",
       "\n",
       "                                             geometry  \\\n",
       "51  LINESTRING (145.05682 -37.65699, 145.05665 -37...   \n",
       "53  LINESTRING (145.05658 -37.65664, 145.05668 -37...   \n",
       "50  LINESTRING (145.05615 -37.65695, 145.05647 -37...   \n",
       "49  LINESTRING (145.05599 -37.65704, 145.05631 -37...   \n",
       "48  LINESTRING (145.05583 -37.65713, 145.05583 -37...   \n",
       "47  LINESTRING (145.05583 -37.65713, 145.05595 -37...   \n",
       "34  POLYGON ((145.05939 -37.65564, 145.05933 -37.6...   \n",
       "85  POLYGON ((145.05677 -37.65641, 145.05675 -37.6...   \n",
       "84  POLYGON ((145.05563 -37.65674, 145.05557 -37.6...   \n",
       "83  POLYGON ((145.05555 -37.65548, 145.05560 -37.6...   \n",
       "\n",
       "                                full_name      osm_id osm_type        lat  \\\n",
       "51          highway footway surface paved  1105105836      way -37.657301   \n",
       "53       highway footway surface concrete  1105106176      way -37.656814   \n",
       "50  highway service service parking_aisle  1105105835      way -37.657204   \n",
       "49  highway service service parking_aisle  1105105834      way -37.657308   \n",
       "48  highway service service parking_aisle  1105105833      way -37.657328   \n",
       "47  highway service service parking_aisle  1105105832      way -37.656905   \n",
       "34                           natural wood   277564665      way -37.655313   \n",
       "85           leisure pitch sport softball  1171940509      way -37.656364   \n",
       "84           leisure pitch sport softball  1171940508      way -37.656532   \n",
       "83           leisure pitch sport softball  1171940507      way -37.655700   \n",
       "\n",
       "           lng  \n",
       "51  145.056369  \n",
       "53  145.056755  \n",
       "50  145.056434  \n",
       "49  145.056267  \n",
       "48  145.056424  \n",
       "47  145.056340  \n",
       "34  145.058038  \n",
       "85  145.056266  \n",
       "84  145.055199  \n",
       "83  145.055967  \n",
       "\n",
       "[10 rows x 34 columns]"
      ]
     },
     "execution_count": 202,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_k_df."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 198,
   "id": "90aa610d-8e5a-49a3-b742-c6428a221739",
   "metadata": {},
   "outputs": [],
   "source": [
    "top_k_df_all = pd.concat(top_k_dfs)\n",
    "top_k_df_all.to_csv(os.path.join(WRITE_DIR, 'poi_top10.csv'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dc14a1d-939e-4037-95f1-093ff57819f4",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API