Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 9287bc8
  • /
  • mapReadings.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:0fe22b5b0769fdef81feaf03fca8495cdc73c05a
directory badge Iframe embedding
swh:1:dir:9287bc84340ee2f1ec0e4d777e59e51908046a3d

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
mapReadings.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Mapping Old Babylonian readings to Unicode\n",
    "\n",
    "## Task\n",
    "\n",
    "We want to map *readings* and *graphemes* in cuneiform corpora to cuneiform unicode characters,\n",
    "based on extant mapping tables.\n",
    "\n",
    "We generate a plain mapping that can be used readily by programs that convert from ATF to TF or something else.\n",
    "\n",
    "## Problem\n",
    "\n",
    "There are multiple mapping tables, there are several ways to transliterate readings.\n",
    "\n",
    "## Sources\n",
    "\n",
    "We take the ATF transliterations from CDLI, for tablets found by a search on AbB and Old Babylonian.\n",
    "\n",
    "We take the file\n",
    "[GeneratedSignList.json](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/GeneratedSignList.json)\n",
    "with mappings like\n",
    "\n",
    "```json\n",
    "        \"BANIA\": {\n",
    "            \"signName\": \"BANIA\",\n",
    "            \"signNumber\": 551,\n",
    "            \"signCunei\": \"๐’‘”\",\n",
    "            \"codePoint\": \"\",\n",
    "            \"values\":\n",
    "\t\t\t[\n",
    "                \"BANIA\", \"Aล 2.UoverU\", \"5SลชTU\"\n",
    "            ]\n",
    "        },\n",
    "        \"MA\": {\n",
    "            \"signName\": \"MA\",\n",
    "            \"signNumber\": 552,\n",
    "            \"signCunei\": \"๐’ˆ \",\n",
    "            \"codePoint\": \"\",\n",
    "            \"values\":\n",
    "\t\t\t[\n",
    "                \"MA\", \"PEล 3\", \"PEล ล E\", \"WA6\"\n",
    "            ]\n",
    "        },\n",
    "```\n",
    "\n",
    "See [transcription](https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md)\n",
    "about the provenance of this file.\n",
    "\n",
    "# Status\n",
    "\n",
    "This is work in progress. \n",
    "The mapping is needed in the conversion from ATF to TF in the program\n",
    "[tfFromATF.py](tfFromATF.py).\n",
    "\n",
    "# Authors\n",
    "\n",
    "Cale Johnson, Martijn Kokken, Dirk Roorda\n",
    "\n",
    "# Acknowledgements\n",
    "\n",
    "We are indebted to **Auday Hussein** for helpfully sending *GeneratedSignList.json* file to us;\n",
    "to **Alba de Ridder** for hints and comments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import collections\n",
    "import re\n",
    "import json\n",
    "from unicodedata import name as uname\n",
    "\n",
    "from tf.app import use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code\n",
      "Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<b>Documentation:</b> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/\" title=\"provenance of Old Babylonian Letters 1900-1600: Cuneiform tablets \">OLDBABYLONIAN</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md\" title=\"How TF features represent ATF\">Character table</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"OLDBABYLONIAN feature documentation\">Feature docs</a> <a target=\"_blank\" href=\"https://github.com/annotation/app-oldbabylonian\" title=\"oldbabylonian API documentation\">oldbabylonian API</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/\" title=\"text-fabric-api\">Text-Fabric API 7.5.1</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Use/Search/\" title=\"Search Templates Introduction and Reference\">Search Reference</a><details open><summary><b>Loaded features</b>:</summary>\n",
       "<p><b>Old Babylonian Letters 1900-1600: Cuneiform tablets </b>: <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ARK.tf\">ARK</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/after.tf\">after</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afterr.tf\">afterr</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afteru.tf\">afteru</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atf.tf\">atf</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpost.tf\">atfpost</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpre.tf\">atfpre</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/author.tf\">author</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/col.tf\">col</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collated.tf\">collated</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collection.tf\">collection</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/comment.tf\">comment</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/damage.tf\">damage</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/det.tf\">det</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnote.tf\">docnote</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnumber.tf\">docnumber</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excavation.tf\">excavation</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excised.tf\">excised</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/face.tf\">face</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/flags.tf\">flags</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/fraction.tf\">fraction</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/genre.tf\">genre</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/grapheme.tf\">grapheme</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemer.tf\">graphemer</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemeu.tf\">graphemeu</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lang.tf\">lang</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/langalt.tf\">langalt</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ln.tf\">ln</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnc.tf\">lnc</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnno.tf\">lnno</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/material.tf\">material</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/missing.tf\">missing</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumcode.tf\">museumcode</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumname.tf\">museumname</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/object.tf\">object</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operator.tf\">operator</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatorr.tf\">operatorr</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatoru.tf\">operatoru</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/otype.tf\">otype</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/period.tf\">period</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pnumber.tf\">pnumber</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primecol.tf\">primecol</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primeln.tf\">primeln</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pubdate.tf\">pubdate</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/question.tf\">question</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/reading.tf\">reading</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingr.tf\">readingr</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingu.tf\">readingu</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarkable.tf\">remarkable</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarks.tf\">remarks</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/repeat.tf\">repeat</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLn.tf\">srcLn</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLnNum.tf\">srcLnNum</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcfile.tf\">srcfile</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/subgenre.tf\">subgenre</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/supplied.tf\">supplied</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/sym.tf\">sym</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symr.tf\">symr</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symu.tf\">symu</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/trans.tf\">trans</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/transcriber.tf\">transcriber</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/translation@en.tf\">translation@ll</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/type.tf\">type</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/uncertain.tf\">uncertain</a>  <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/volume.tf\">volume</a>  <b><i><a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/oslots.tf\">oslots</a></i></b> </p></details>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "@font-face {\n",
       "  font-family: \"Santakku\";\n",
       "  src:\n",
       "    local(\"Santakku.ttf\"),\n",
       "    url(\"https://github.com/annotation/text-fabric/blob/master/tf/server/static/fonts/Santakku.woff?raw=true\");\n",
       "}\n",
       ".txtn,.txtn a:visited,.txtn a:link {\n",
       "    font-family: sans-serif;\n",
       "    font-size: normal;\n",
       "    text-decoration: none;\n",
       "}\n",
       ".txtp,.txtp a:visited,.txtp a:link {\n",
       "    font-family: monospace;\n",
       "    font-size: normal;\n",
       "    text-decoration: none;\n",
       "}\n",
       ".txtr,.txtr a:visited,.txtr a:link {\n",
       "    font-family: serif;\n",
       "    font-size: large;\n",
       "    text-decoration: none;\n",
       "}\n",
       ".txtu,.txtu a:visited,.txtu a:link {\n",
       "    font-family: Santakku;\n",
       "    font-size: x-large;\n",
       "    text-decoration: none;\n",
       "}\n",
       ".features {\n",
       "    font-family: monospace;\n",
       "    font-size: medium;\n",
       "    font-weight: bold;\n",
       "    color: #0a6611;\n",
       "    display: flex;\n",
       "    flex-flow: column nowrap;\n",
       "    padding: 0.1em;\n",
       "    margin: 0.1em;\n",
       "    direction: ltr;\n",
       "}\n",
       ".features div,.features span {\n",
       "    padding: 0;\n",
       "    margin: -0.1rem 0;\n",
       "}\n",
       ".features .f {\n",
       "    font-family: sans-serif;\n",
       "    font-size: x-small;\n",
       "    font-weight: normal;\n",
       "    color: #5555bb;\n",
       "}\n",
       ".features .xft {\n",
       "  color: #000000;\n",
       "  background-color: #eeeeee;\n",
       "  font-size: medium;\n",
       "  margin: 0.1em 0em;\n",
       "}\n",
       ".features .xft .f {\n",
       "  color: #000000;\n",
       "  background-color: #eeeeee;\n",
       "  font-style: italic;\n",
       "  font-size: small;\n",
       "  font-weight: normal;\n",
       "}\n",
       ".pnum {\n",
       "    font-family: sans-serif;\n",
       "    font-size: small;\n",
       "    font-weight: bold;\n",
       "    color: #444444;\n",
       "}\n",
       ".nd {\n",
       "    font-family: monospace;\n",
       "    font-size: x-small;\n",
       "    color: #999999;\n",
       "}\n",
       ".meta {\n",
       "    display: flex;\n",
       "    justify-content: flex-start;\n",
       "    align-items: flex-start;\n",
       "    align-content: flex-start;\n",
       "    flex-flow: row nowrap;\n",
       "}\n",
       ".features,.comments {\n",
       "    display: flex;\n",
       "    justify-content: flex-start;\n",
       "    align-items: flex-start;\n",
       "    align-content: flex-start;\n",
       "    flex-flow: column nowrap;\n",
       "}\n",
       ".children {\n",
       "    display: flex;\n",
       "    justify-content: flex-start;\n",
       "    align-items: flex-start;\n",
       "    align-content: flex-start;\n",
       "    border: 0;\n",
       "    background-color: #ffffff;\n",
       "}\n",
       ".children.document {\n",
       "    flex-flow: column nowrap;\n",
       "}\n",
       ".children.face {\n",
       "    flex-flow: column nowrap;\n",
       "}\n",
       ".children.line {\n",
       "    align-items: stretch;\n",
       "    flex-flow: row nowrap;\n",
       "}\n",
       ".children.cluster {\n",
       "    flex-flow: row wrap;\n",
       "}\n",
       ".children.line {\n",
       "    align-items: stretch;\n",
       "    flex-flow: row nowrap;\n",
       "}\n",
       ".children.sign {\n",
       "    flex-flow: column nowrap;\n",
       "}\n",
       ".contnr {\n",
       "    width: fit-content;\n",
       "}\n",
       ".contnr.document,.contnr.face,\n",
       ".contnr.line,\n",
       ".contnr.cluster,\n",
       ".contnr.word,\n",
       ".contnr.sign {\n",
       "    display: flex;\n",
       "    justify-content: flex-start;\n",
       "    align-items: flex-start;\n",
       "    align-content: flex-start;\n",
       "    flex-flow: column nowrap;\n",
       "    background: #ffffff none repeat scroll 0 0;\n",
       "    padding:  0.5em 0.1em 0.1em 0.1em;\n",
       "    margin: 0.8em 0.1em 0.1em 0.1em;\n",
       "    border-radius: 0.2em;\n",
       "    border-style: solid;\n",
       "    border-width: 0.2em;\n",
       "    font-size: small;\n",
       "}\n",
       ".contnr.document,.contnr.face {\n",
       "    border-color: #bb8800;\n",
       "}\n",
       ".contnr.line {\n",
       "    border-color: #0088bb;\n",
       "}\n",
       ".contnr.cluster {\n",
       "    flex-flow: row wrap;\n",
       "    border: 0;\n",
       "}\n",
       ".contnr.word {\n",
       "    border-color: #44bbff;\n",
       "}\n",
       ".contnr.sign {\n",
       "    border-color: #bbbbbb;\n",
       "}\n",
       ".contnr.hl {\n",
       "    background-color: #ffee66;\n",
       "}\n",
       ".lbl.document,.lbl.face,\n",
       ".lbl.line,\n",
       ".lbl.cluster,\n",
       ".lbl.sign,.lbl.word {\n",
       "    margin-top: -1.2em;\n",
       "    margin-left: 1em;\n",
       "    background: #ffffff none repeat scroll 0 0;\n",
       "    padding: 0 0.3em;\n",
       "    border-style: solid;\n",
       "    font-size: small;\n",
       "    display: block;\n",
       "}\n",
       ".lbl.document,.lbl.face {\n",
       "    border-color: #bb8800;\n",
       "    border-width: 0.3em;\n",
       "    border-radius: 0.3em;\n",
       "    color: #bb8800;\n",
       "}\n",
       ".lbl.line {\n",
       "    border-color: #0088bb;\n",
       "    border-width: 0.3em;\n",
       "    border-radius: 0.3em;\n",
       "    color: #0088bb;\n",
       "}\n",
       ".lbl.cluster {\n",
       "    border-color: #dddddd;\n",
       "    border-width: 0.2em;\n",
       "    border-radius: 0.2em;\n",
       "    color: #0000cc;\n",
       "}\n",
       ".lbl.word {\n",
       "    border-color: #44bbff;\n",
       "    border-width: 0.2em;\n",
       "    border-radius: 0.2em;\n",
       "    font-size: medium;\n",
       "    color: #000000;\n",
       "}\n",
       ".lbl.sign {\n",
       "    border-color: #bbbbbb;\n",
       "    border-width: 0.1em;\n",
       "    border-radius: 0.1em;\n",
       "    font-size: small;\n",
       "    color: #000000;\n",
       "}\n",
       ".op {\n",
       "    padding:  0.5em 0.1em 0.1em 0.1em;\n",
       "    margin: 0.8em 0.1em 0.1em 0.1em;\n",
       "    font-family: monospace;\n",
       "    font-size: x-large;\n",
       "    font-weight: bold;\n",
       "}\n",
       ".name {\n",
       "    font-family: monospace;\n",
       "    font-size: medium;\n",
       "    color: #0000bb;\n",
       "}\n",
       ".period {\n",
       "    font-family: monospace;\n",
       "    font-size: medium;\n",
       "    font-weight: bold;\n",
       "    color: #0000bb;\n",
       "}\n",
       ".text {\n",
       "    font-family: sans-serif;\n",
       "    font-size: x-small;\n",
       "    color: #000000;\n",
       "}\n",
       ".srcln {\n",
       "    font-family: monospace;\n",
       "    font-size: medium;\n",
       "    color: #000000;\n",
       "}\n",
       ".srclnnum {\n",
       "    font-family: monospace;\n",
       "    font-size: x-small;\n",
       "    color: #0000bb;\n",
       "}\n",
       ".comment {\n",
       "    color: #7777dd;\n",
       "    font-family: monospace;\n",
       "    font-size: small;\n",
       "}\n",
       ".operator {\n",
       "    color: #ff77ff;\n",
       "    font-size: large;\n",
       "}\n",
       "/* LANGUAGE: superscript and subscript */\n",
       "\n",
       "/* cluster */\n",
       ".det {\n",
       "    vertical-align: super;\n",
       "}\n",
       "/* cluster */\n",
       ".langalt {\n",
       "    vertical-align: sub;\n",
       "}\n",
       "/* REDACTIONAL: line over or under  */\n",
       "\n",
       "/* flag */\n",
       ".collated {\n",
       "    font-weight: bold;\n",
       "    text-decoration: underline;\n",
       "}\n",
       "/* cluster */\n",
       ".excised {\n",
       "    color: #dd0000;\n",
       "    text-decoration: line-through;\n",
       "}\n",
       "/* cluster */\n",
       ".supplied {\n",
       "    color: #0000ff;\n",
       "    text-decoration: overline;\n",
       "}\n",
       "/* flag */\n",
       ".remarkable {\n",
       "    font-weight: bold;\n",
       "    text-decoration: overline;\n",
       "}\n",
       "\n",
       "/* UNSURE: italic*/\n",
       "\n",
       "/* cluster */\n",
       ".uncertain {\n",
       "    font-style: italic\n",
       "}\n",
       "/* flag */\n",
       ".question {\n",
       "    font-weight: bold;\n",
       "    font-style: italic\n",
       "}\n",
       "\n",
       "/* BROKEN: text-shadow */\n",
       "\n",
       "/* cluster */\n",
       ".missing {\n",
       "    color: #999999;\n",
       "    text-shadow: #bbbbbb 1px 1px;\n",
       "}\n",
       "/* flag */\n",
       ".damage {\n",
       "    font-weight: bold;\n",
       "    color: #999999;\n",
       "    text-shadow: #bbbbbb 1px 1px;\n",
       "}\n",
       ".empty {\n",
       "  color: #ff0000;\n",
       "}\n",
       "\n",
       "\n",
       "tr.tf, td.tf, th.tf {\n",
       "  text-align: left;\n",
       "}\n",
       "\n",
       "span.hldot {\n",
       "\tbackground-color: var(--hl-strong);\n",
       "\tborder: 0.2rem solid var(--hl-rim);\n",
       "\tborder-radius: 0.4rem;\n",
       "\t/*\n",
       "\tdisplay: inline-block;\n",
       "\twidth: 0.8rem;\n",
       "\theight: 0.8rem;\n",
       "\t*/\n",
       "}\n",
       "span.hl {\n",
       "\tbackground-color: var(--hl-strong);\n",
       "\tborder-width: 0;\n",
       "\tborder-radius: 0.1rem;\n",
       "\tborder-style: solid;\n",
       "}\n",
       "\n",
       "span.hlup {\n",
       "\tborder-color: var(--hl-dark);\n",
       "\tborder-width: 0.1rem;\n",
       "\tborder-style: solid;\n",
       "\tborder-radius: 0.2rem;\n",
       "  padding: 0.2rem;\n",
       "}\n",
       "\n",
       ":root {\n",
       "\t--hl-strong:        hsla( 60, 100%,  70%, 0.9  );\n",
       "\t--hl-rim:           hsla( 55, 100%,  60%, 0.9  );\n",
       "\t--hl-dark:          hsla( 55, 100%,  40%, 0.9  );\n",
       "}\n",
       "</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<details open><summary><b>API members</b>:</summary>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">C Computed</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Call AllComputeds</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Cs ComputedString</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">E Edge</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Eall AllEdges</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Es EdgeString</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ensureLoaded</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">TF</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ignored</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">loadLog</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Locality/#locality\" title=\"doc\">L Locality</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">cache</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">error</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">indent</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">info</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">reset</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">N Nodes</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKey</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKeyTuple</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">otypeRank</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortNodes</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">F Feature</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fall AllFeatures</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fs FeatureString</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Search/#search\" title=\"doc\">S Search</a><br/>\n",
       "<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Text/#text\" title=\"doc\">T Text</a></details>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "A = use('oldbabylonian', hoist=globals(), lgc=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Local topography"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "BASE = os.path.expanduser('~/github')\n",
    "ORG = 'Nino-cunei'\n",
    "REPO = 'oldbabylonian'\n",
    "\n",
    "REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n",
    "\n",
    "WRITING_DIR = f'{REPO_DIR}/sources/writing'\n",
    "\n",
    "SIGN_FILE = 'GeneratedSignList.json'\n",
    "SIGN_PATH = f'{WRITING_DIR}/{SIGN_FILE}'\n",
    "\n",
    "MAPPING_FILE = f'{os.path.abspath(\"..\")}/characters/mapping.tsv'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading collection\n",
    "\n",
    "We use TF to collect all readings from the corpus in a set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "969 different tokens in corpus\n"
     ]
    }
   ],
   "source": [
    "READABLE_TYPES = {'reading', 'grapheme', 'numeral', 'complex'}\n",
    "\n",
    "tokens = set()\n",
    "\n",
    "for s in F.otype.s('sign'):\n",
    "  typ = F.type.v(s)\n",
    "  if typ not in READABLE_TYPES:\n",
    "    continue\n",
    "  reading = F.reading.v(s)\n",
    "  if typ == 'numeral':\n",
    "    repeat = F.repeat.v(s)\n",
    "    fraction = F.fraction.v(s)\n",
    "    if repeat:\n",
    "      if repeat > 0:\n",
    "        tokens.add((repeat, reading))\n",
    "      else:\n",
    "        tokens.add(reading)\n",
    "    else:\n",
    "      tokens.add((fraction, reading))\n",
    "    continue\n",
    "  for token in (F.reading.v(s), F.grapheme.v(s)):\n",
    "    if token:\n",
    "      tokens.add(token)\n",
    "\n",
    "print(f'{len(tokens)} different tokens in corpus')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unicode style versus ATF style\n",
    "\n",
    "We use mappings between Unicode style transliterations and ATF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "transAscii = {\n",
    "    'ลก': 'sz',\n",
    "    'แนฃ': 's,',\n",
    "    'ล›': \"s'\",\n",
    "    'แนญ': 't,',\n",
    "    'แธซ': 'h,',\n",
    "}\n",
    "\n",
    "transAscii.update({k.upper(): v.upper() for (k, v) in transAscii.items()})\n",
    "\n",
    "def makeAscii(r):\n",
    "  for (rin, rout) in transAscii.items():\n",
    "    r = r.replace(rin, rout)\n",
    "  return r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ลก': 'sz',\n",
       " 'แนฃ': 's,',\n",
       " 'ล›': \"s'\",\n",
       " 'แนญ': 't,',\n",
       " 'แธซ': 'h,',\n",
       " 'ล ': 'SZ',\n",
       " 'แนข': 'S,',\n",
       " 'ลš': \"S'\",\n",
       " 'แนฌ': 'T,',\n",
       " 'แธช': 'H,'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transAscii"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "REPEAT_INV = dict(\n",
    "  one=1,\n",
    "  two=2,\n",
    "  three=3,\n",
    "  four=4,\n",
    "  five=5,\n",
    "  six=6,\n",
    "  seven=7,\n",
    "  eight=8,\n",
    "  nine=9,\n",
    ")\n",
    "\n",
    "REPEAT = {v: k for (k, v) in REPEAT_INV.items()}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "FRACTION = {\n",
    "  '1/2': 'one half',\n",
    "  '1/3': 'one third',\n",
    "  '2/3': 'two thirds',\n",
    "  '1/4': 'one quarter',\n",
    "  '1/6': 'one sixth',\n",
    "  '5/6': 'five sixths',\n",
    "  '1/8': 'one eighth',\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Read the sign list\n",
    "\n",
    "We read the json file with generated signs.\n",
    "\n",
    "For each sign, we find a list of *values*.\n",
    "\n",
    "These values correspond to possible readings or graphemes, in short, *tokens*. \n",
    "They are in unicode transliteration style.\n",
    "\n",
    "In the mapping we create, we convert them to plain ATF,\n",
    "which makes it easier to look them up from our Old Babylonian corpus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1768 signs in the json file\n",
      "8765 distinct values in table\n"
     ]
    }
   ],
   "source": [
    "with open(SIGN_PATH) as fh:\n",
    "  signs = json.load(fh)['signs']\n",
    "\n",
    "print(f'{len(signs)} signs in the json file')\n",
    "\n",
    "mapping = collections.defaultdict(set)\n",
    "\n",
    "for (sign, signData) in signs.items():\n",
    "  uniStr = signData['signCunei']\n",
    "  values = signData['values']\n",
    "  for value in values:\n",
    "    valueAscii = makeAscii(value)\n",
    "    mapping[valueAscii].add(uniStr)\n",
    "\n",
    "print(f'{len(mapping)} distinct values in table')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Token lookup\n",
    "\n",
    "We look up each Old Babylonian token in the mapping just constructed.\n",
    "\n",
    "Depending on whether we find 0, 1 or multiple values, we store them in dictionaries\n",
    "`unmapped`, `unique`, `multiple`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "151 unmapped tokens\n",
      " 50 ambiguously mapped tokens\n",
      "768 uniquely mapped tokens\n"
     ]
    }
   ],
   "source": [
    "MAPPING_FIXES = {\n",
    "    'd': 'dingir',\n",
    "}\n",
    "\n",
    "unmapped = set()\n",
    "unique = {}\n",
    "multiple = {}\n",
    "\n",
    "for t in tokens:\n",
    "  if type(t) is tuple:\n",
    "    unmapped.add(t)\n",
    "    continue\n",
    "  tLookup = MAPPING_FIXES.get(t, t)\n",
    "  tU = tLookup.upper()\n",
    "  if tU not in mapping:\n",
    "    unmapped.add(t)\n",
    "    continue\n",
    "  targets = mapping[tU]\n",
    "  if len(targets) == 1:\n",
    "    unique[t] = list(targets)[0]\n",
    "  else:\n",
    "    multiple[t] = targets\n",
    "    \n",
    "print(f'{len(unmapped):>3} unmapped tokens')\n",
    "print(f'{len(multiple):>3} ambiguously mapped tokens')\n",
    "print(f'{len(unique):>3} uniquely mapped tokens')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unmapped tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "151 unmapped tokens\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[\"'i\",\n",
       " 'ah',\n",
       " 'AH',\n",
       " 'alamusz',\n",
       " 'asal2',\n",
       " (1, 'asz'),\n",
       " (2, 'asz'),\n",
       " (3, 'asz'),\n",
       " (4, 'asz'),\n",
       " (5, 'asz'),\n",
       " (6, 'asz'),\n",
       " (7, 'asz'),\n",
       " (8, 'asz'),\n",
       " (9, 'asz'),\n",
       " 'babila2',\n",
       " (1, 'ban2'),\n",
       " (2, 'ban2'),\n",
       " (3, 'ban2'),\n",
       " (4, 'ban2'),\n",
       " (5, 'ban2'),\n",
       " 'barig',\n",
       " (1, 'barig'),\n",
       " (2, 'barig'),\n",
       " (3, 'barig'),\n",
       " (4, 'barig'),\n",
       " (5, 'barig'),\n",
       " (1, \"bur'u\"),\n",
       " (2, \"bur'u\"),\n",
       " (3, \"bur'u\"),\n",
       " (4, \"bur'u\"),\n",
       " (5, \"bur'u\"),\n",
       " (1, 'bur3'),\n",
       " (2, 'bur3'),\n",
       " (3, 'bur3'),\n",
       " (4, 'bur3'),\n",
       " (5, 'bur3'),\n",
       " (6, 'bur3'),\n",
       " (8, 'bur3'),\n",
       " (9, 'bur3'),\n",
       " 'dah',\n",
       " (1, 'disz'),\n",
       " ('1/2', 'disz'),\n",
       " ('1/3', 'disz'),\n",
       " (2, 'disz'),\n",
       " ('2/3', 'disz'),\n",
       " (3, 'disz'),\n",
       " (4, 'disz'),\n",
       " (5, 'disz'),\n",
       " ('5/6', 'disz'),\n",
       " (6, 'disz'),\n",
       " (7, 'disz'),\n",
       " (8, 'disz'),\n",
       " (9, 'disz'),\n",
       " 'duh',\n",
       " 'EH',\n",
       " 'eh',\n",
       " 'eri11',\n",
       " (1, 'esze3'),\n",
       " (2, 'esze3'),\n",
       " (3, 'esze3'),\n",
       " (1, 'gesz'),\n",
       " (9, 'gesz'),\n",
       " (1, \"gesz'u\"),\n",
       " (2, \"gesz'u\"),\n",
       " (3, \"gesz'u\"),\n",
       " (4, \"gesz'u\"),\n",
       " (7, \"gesz'u\"),\n",
       " (1, 'gesz2'),\n",
       " (2, 'gesz2'),\n",
       " (3, 'gesz2'),\n",
       " (4, 'gesz2'),\n",
       " (5, 'gesz2'),\n",
       " (6, 'gesz2'),\n",
       " (7, 'gesz2'),\n",
       " (8, 'gesz2'),\n",
       " (9, 'gesz2'),\n",
       " 'geszimmar',\n",
       " (2, 'gisz'),\n",
       " 'gudu4',\n",
       " 'HA',\n",
       " 'ha',\n",
       " 'had2',\n",
       " 'hal',\n",
       " 'har',\n",
       " 'HAR',\n",
       " 'he',\n",
       " 'he2',\n",
       " 'HI',\n",
       " 'hi',\n",
       " 'hu',\n",
       " 'HU',\n",
       " 'hub2',\n",
       " 'hun',\n",
       " 'hur',\n",
       " 'huz',\n",
       " 'ih',\n",
       " 'IH',\n",
       " (1, 'iku'),\n",
       " ('1/2', 'iku'),\n",
       " (2, 'iku'),\n",
       " (3, 'iku'),\n",
       " (4, 'iku'),\n",
       " 'itu',\n",
       " 'kislah',\n",
       " 'lah',\n",
       " 'lah4',\n",
       " 'lah5',\n",
       " 'lah6',\n",
       " 'lal3',\n",
       " 'm',\n",
       " 'mah',\n",
       " 'muhaldim',\n",
       " 'nigar',\n",
       " 'nirah',\n",
       " 'p',\n",
       " 'pesz2',\n",
       " 'sa10',\n",
       " 'sahar',\n",
       " 'siskur2',\n",
       " 'sz',\n",
       " 'szagina',\n",
       " 'szah',\n",
       " 'szah2',\n",
       " 'szandana',\n",
       " (1, 'szar2'),\n",
       " (2, 'szar2'),\n",
       " 'sze9',\n",
       " 'szii',\n",
       " 'szunigin',\n",
       " 'tah',\n",
       " 'tap',\n",
       " (1, 'u'),\n",
       " (2, 'u'),\n",
       " (3, 'u'),\n",
       " (4, 'u'),\n",
       " (5, 'u'),\n",
       " 'udru',\n",
       " 'uh',\n",
       " 'UH',\n",
       " 'UH2',\n",
       " 'uh2',\n",
       " 'UH3',\n",
       " 'uh3',\n",
       " 'ukken',\n",
       " 'umi',\n",
       " 'unu',\n",
       " 'ura',\n",
       " '|A.GAB.LISZ|',\n",
       " '|KA.TA|',\n",
       " '|UD.KIB.NU|',\n",
       " '|UD.KIB|']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unkey = lambda x: (x[1].lower(), str(x[0])) if type(x) is tuple else (x.lower(), '')\n",
    "\n",
    "print(f'{len(unmapped):>3} unmapped tokens')\n",
    "sorted(unmapped, key=unkey)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fix the unmapped tokens\n",
    "\n",
    "We look up the unmapped tokens in the unicode table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "cuneiBlocks = {\n",
    "  'Cuneiform': ('12000', '123FF'),\n",
    "  'Cuneiform Numbers and Punctuation': ('12400', '1247F'),\n",
    "  'Early Dynastic Cuneiform': ('12480', '1254F'),\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "cunicode = {}\n",
    "\n",
    "for (block, (start, end)) in cuneiBlocks.items():\n",
    "  for u in range(int(start, 16), int(end, 16) + 1):\n",
    "    c = chr(u)\n",
    "    name = uname(c, None)\n",
    "    if name is None:\n",
    "      continue\n",
    "    cunicode[name] = c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fixed 67 out of 151\n",
      "FIXED\n",
      "\tasal2           => ๐’€ท\n",
      "\t(1, 'asz')      => ๐’€ธ\n",
      "\t(1, 'disz')     => ๐’น\n",
      "\t(2, 'disz')     => ๐’น\n",
      "\tduh             => ๐’‚ƒ\n",
      "\t(2, 'gisz')     => ๐’„‘\n",
      "\tHA              => ๐’„ฉ\n",
      "\tha              => ๐’„ฉ\n",
      "\thal             => ๐’„ฌ\n",
      "\tHI              => ๐’„ญ\n",
      "\thi              => ๐’„ญ\n",
      "\tHU              => ๐’„ท\n",
      "\thu              => ๐’„ท\n",
      "\thub2            => ๐’„ธ\n",
      "\t'i              => ๐’„ฟ\n",
      "\tmah             => ๐’ˆค\n",
      "\tpesz2           => ๐’‰พ\n",
      "\t(1, 'szar2')    => ๐’Šน\n",
      "\t(1, 'u')        => ๐’Œ‹\n",
      "\t(2, 'u')        => ๐’Œ‹\n",
      "\t(3, 'u')        => ๐’Œ‹\n",
      "\t(2, 'asz')      => ๐’€\n",
      "\t(3, 'asz')      => ๐’\n",
      "\t(4, 'asz')      => ๐’‚\n",
      "\t(5, 'asz')      => ๐’ƒ\n",
      "\t(6, 'asz')      => ๐’„\n",
      "\t(7, 'asz')      => ๐’…\n",
      "\t(8, 'asz')      => ๐’†\n",
      "\t(9, 'asz')      => ๐’‡\n",
      "\t(3, 'disz')     => ๐’ˆ\n",
      "\t(4, 'disz')     => ๐’‰\n",
      "\t(5, 'disz')     => ๐’Š\n",
      "\t(6, 'disz')     => ๐’‹\n",
      "\t(7, 'disz')     => ๐’Œ\n",
      "\t(8, 'disz')     => ๐’\n",
      "\t(9, 'disz')     => ๐’Ž\n",
      "\t(4, 'u')        => ๐’\n",
      "\t(5, 'u')        => ๐’\n",
      "\t(1, 'gesz2')    => ๐’•\n",
      "\t(2, 'gesz2')    => ๐’–\n",
      "\t(3, 'gesz2')    => ๐’—\n",
      "\t(4, 'gesz2')    => ๐’˜\n",
      "\t(5, 'gesz2')    => ๐’™\n",
      "\t(6, 'gesz2')    => ๐’š\n",
      "\t(7, 'gesz2')    => ๐’›\n",
      "\t(8, 'gesz2')    => ๐’œ\n",
      "\t(9, 'gesz2')    => ๐’\n",
      "\t(1, \"gesz'u\")   => ๐’ž\n",
      "\t(2, \"gesz'u\")   => ๐’Ÿ\n",
      "\t(3, \"gesz'u\")   => ๐’ \n",
      "\t(4, \"gesz'u\")   => ๐’ก\n",
      "\t(2, 'szar2')    => ๐’ฃ\n",
      "\t(1, \"bur'u\")    => ๐’ด\n",
      "\t(2, \"bur'u\")    => ๐’ต\n",
      "\t(3, \"bur'u\")    => ๐’ถ\n",
      "\t(4, \"bur'u\")    => ๐’ธ\n",
      "\t(5, \"bur'u\")    => ๐’น\n",
      "\t(1, 'ban2')     => ๐’‘\n",
      "\t(2, 'ban2')     => ๐’‘\n",
      "\t(3, 'ban2')     => ๐’‘‘\n",
      "\t(4, 'ban2')     => ๐’‘’\n",
      "\t(5, 'ban2')     => ๐’‘”\n",
      "\t(1, 'esze3')    => ๐’‘˜\n",
      "\t(2, 'esze3')    => ๐’‘™\n",
      "\t('1/3', 'disz') => ๐’‘š\n",
      "\t('2/3', 'disz') => ๐’‘›\n",
      "\t('5/6', 'disz') => ๐’‘œ\n",
      "UNFIXED\n",
      "\tah              => ?\n",
      "\tAH              => ?\n",
      "\talamusz         => ?\n",
      "\tbabila2         => ?\n",
      "\tbarig           => ?\n",
      "\t(1, 'barig')    => ?\n",
      "\t(2, 'barig')    => ?\n",
      "\t(3, 'barig')    => ?\n",
      "\t(4, 'barig')    => ?\n",
      "\t(5, 'barig')    => ?\n",
      "\t(1, 'bur3')     => ?\n",
      "\t(2, 'bur3')     => ?\n",
      "\t(3, 'bur3')     => ?\n",
      "\t(4, 'bur3')     => ?\n",
      "\t(5, 'bur3')     => ?\n",
      "\t(6, 'bur3')     => ?\n",
      "\t(8, 'bur3')     => ?\n",
      "\t(9, 'bur3')     => ?\n",
      "\tdah             => ?\n",
      "\t('1/2', 'disz') => ?\n",
      "\tEH              => ?\n",
      "\teh              => ?\n",
      "\teri11           => ?\n",
      "\t(3, 'esze3')    => ?\n",
      "\t(1, 'gesz')     => ?\n",
      "\t(9, 'gesz')     => ?\n",
      "\t(7, \"gesz'u\")   => ?\n",
      "\tgeszimmar       => ?\n",
      "\tgudu4           => ?\n",
      "\thad2            => ?\n",
      "\thar             => ?\n",
      "\tHAR             => ?\n",
      "\the              => ?\n",
      "\the2             => ?\n",
      "\thun             => ?\n",
      "\thur             => ?\n",
      "\thuz             => ?\n",
      "\tih              => ?\n",
      "\tIH              => ?\n",
      "\t(1, 'iku')      => ?\n",
      "\t('1/2', 'iku')  => ?\n",
      "\t(2, 'iku')      => ?\n",
      "\t(3, 'iku')      => ?\n",
      "\t(4, 'iku')      => ?\n",
      "\titu             => ?\n",
      "\tkislah          => ?\n",
      "\tlah             => ?\n",
      "\tlah4            => ?\n",
      "\tlah5            => ?\n",
      "\tlah6            => ?\n",
      "\tlal3            => ?\n",
      "\tm               => ?\n",
      "\tmuhaldim        => ?\n",
      "\tnigar           => ?\n",
      "\tnirah           => ?\n",
      "\tp               => ?\n",
      "\tsa10            => ?\n",
      "\tsahar           => ?\n",
      "\tsiskur2         => ?\n",
      "\tsz              => ?\n",
      "\tszagina         => ?\n",
      "\tszah            => ?\n",
      "\tszah2           => ?\n",
      "\tszandana        => ?\n",
      "\tsze9            => ?\n",
      "\tszii            => ?\n",
      "\tszunigin        => ?\n",
      "\ttah             => ?\n",
      "\ttap             => ?\n",
      "\tudru            => ?\n",
      "\tuh              => ?\n",
      "\tUH              => ?\n",
      "\tUH2             => ?\n",
      "\tuh2             => ?\n",
      "\tUH3             => ?\n",
      "\tuh3             => ?\n",
      "\tukken           => ?\n",
      "\tumi             => ?\n",
      "\tunu             => ?\n",
      "\tura             => ?\n",
      "\t|A.GAB.LISZ|    => ?\n",
      "\t|KA.TA|         => ?\n",
      "\t|UD.KIB.NU|     => ?\n",
      "\t|UD.KIB|        => ?\n"
     ]
    }
   ],
   "source": [
    "mapAddition = {}\n",
    "notFixed = set()\n",
    "\n",
    "def getLookup(r):\n",
    "  return (\n",
    "    r.\n",
    "    replace(\"'\", '').\n",
    "    upper().\n",
    "    replace(\"SZ\", 'SH').\n",
    "    replace('.', ' TIMES ')\n",
    "  )\n",
    "  \n",
    "  \n",
    "for t in sorted(unmapped, key=unkey):\n",
    "  if type(t) is tuple:\n",
    "    if type(t[0]) is int:\n",
    "      (repeat, r) = t\n",
    "      tRepeat = REPEAT.get(repeat, None)\n",
    "      if tRepeat is None:\n",
    "        notFixed.add(t)\n",
    "        continue\n",
    "      tLookup =  getLookup(r)\n",
    "      name = f'CUNEIFORM NUMERIC SIGN {tRepeat.upper()} {tLookup}'\n",
    "      c = cunicode.get(name, None)\n",
    "      if c is not None:\n",
    "        mapAddition[t] = c\n",
    "        continue\n",
    "      name = f'CUNEIFORM SIGN {tLookup}'\n",
    "    else:\n",
    "      (fraction, r) = t\n",
    "      tFraction = FRACTION.get(fraction, None)\n",
    "      if tFraction is None:\n",
    "        notFixed.add(t)\n",
    "        continue\n",
    "      tLookup =  getLookup(r)\n",
    "      name = f'CUNEIFORM NUMERIC SIGN {tFraction.upper()} {tLookup}'\n",
    "  else:\n",
    "    tLookup =  getLookup(t)\n",
    "    name = f'CUNEIFORM SIGN {tLookup}'\n",
    "  c = cunicode.get(name, None)\n",
    "  if c is None:\n",
    "    notFixed.add(t)\n",
    "  else:\n",
    "    mapAddition[t] = c\n",
    "\n",
    "print(f'fixed {len(mapAddition)} out of {len(unmapped)}')\n",
    "\n",
    "if mapAddition:\n",
    "  print('FIXED')\n",
    "  for (t, c) in sorted(mapAddition.items(), key=unkey):\n",
    "    print(f'\\t{str(t):<15} => {c}')\n",
    "else:\n",
    "  print('NOTHING FIXED')\n",
    "  \n",
    "if notFixed:\n",
    "  print('UNFIXED')\n",
    "  for t in sorted(notFixed, key=unkey):\n",
    "    print(f'\\t{str(t):<15} => ?')\n",
    "else:\n",
    "  print('ALL FIXED')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Solutions\n",
    "\n",
    "Most of the remaining problems above got solved by a \n",
    "[table provided by Martijn Kokken](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/MartijnKokken.txt)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "m               => ?\n",
    "n               => ?\n",
    "p               => ?\n",
    "sz              => ?\n",
    "sze9            => ?\n",
    "szunigin        => ?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ah': '๐’„ด',\n",
       " 'AH': '๐’„ด',\n",
       " 'alamusz': '๐’‹ญ',\n",
       " 'babila2': '๐’†๐’€ญ๐’Š',\n",
       " 'dah': '๐’ˆญ',\n",
       " 'eh': '๐’„ด',\n",
       " 'EH': '๐’„ด',\n",
       " 'eri11': '๐’€•',\n",
       " 'geszimmar': '๐’Šท',\n",
       " 'gudu4': '๐’„ด๐’ˆจ',\n",
       " 'had2': '๐’Œ“',\n",
       " 'har': '๐’„ฏ',\n",
       " 'HAR': '๐’„ฏ',\n",
       " 'he': '๐’„ญ',\n",
       " 'he2': '๐’ƒถ',\n",
       " 'hun': '๐’‚ ',\n",
       " 'hur': '๐’„ฏ',\n",
       " 'huz': '๐’ˆ',\n",
       " 'ih': '๐’„ด',\n",
       " 'IH': '๐’„ด',\n",
       " 'itu': '๐’Œ—',\n",
       " 'KA': '๐’…—๐’‹ซ',\n",
       " 'kislah': '๐’† ๐’Œ“',\n",
       " 'lah': '๐’Œ“',\n",
       " 'lah4': '๐’ป',\n",
       " 'lah5': '๐’บ๐’บ',\n",
       " 'lah6': '๐’บ',\n",
       " 'lal3': '๐’‹ญ',\n",
       " 'muhaldim': '๐’ˆฌ',\n",
       " 'nigar': '๐’Œ‹๐’Œ“๐’†ค',\n",
       " 'nirah': '๐’ˆฒ',\n",
       " 'sa10': '๐’‰š',\n",
       " 'sahar': '๐’…–',\n",
       " 'siskur2': '๐’€ฌ๐’€ฌ',\n",
       " 'szagina': '๐’„Š๐’€ด',\n",
       " 'szah': '๐’‹š',\n",
       " 'szah2': '๐’‚„',\n",
       " 'szandana': '๐’ƒฒ๐’‰Œ',\n",
       " 'tah': '๐’ˆญ',\n",
       " 'tap': '๐’‹ฐ',\n",
       " 'udru': '๐’€พ',\n",
       " 'UH': '๐’„ด',\n",
       " 'uh': '๐’„ด',\n",
       " 'UH2': '๐’Œ“๐’†ต',\n",
       " 'uh2': '๐’Œ“๐’†ต',\n",
       " 'uh3': '๐’†ต',\n",
       " 'UH3': '๐’†ต',\n",
       " 'ukken': '๐’Œบ',\n",
       " 'unu': '๐’€•',\n",
       " 'barig': '๐’น',\n",
       " '1(barig)': '๐’น',\n",
       " '2(barig)': '๐’น๐’น',\n",
       " '3(barig)': '๐’น๐’น๐’น',\n",
       " '4(barig)': '๐’',\n",
       " '5(barig)': '๐’„ฅ',\n",
       " 'bur3': '๐’Œ‹',\n",
       " \"bur'u\": '๐’ด',\n",
       " '1(bur3)': '๐’Œ‹',\n",
       " '2(bur3)': '๐’Œ‹๐’Œ‹',\n",
       " '3(bur3)': '๐’Œ‹๐’Œ‹๐’Œ‹',\n",
       " '4(bur3)': '๐’',\n",
       " '5(bur3)': '๐’',\n",
       " '6(bur3)': '๐’‘',\n",
       " '7(bur3)': '๐’’',\n",
       " '8(bur3)': '๐’“',\n",
       " '9(bur3)': '๐’”',\n",
       " '1/2(disz)': '๐’ˆฆ',\n",
       " '13(disz)': '๐’Ž™๐’ˆ',\n",
       " '1(iku)': '๐’€ธ',\n",
       " '2(iku)': '๐’€',\n",
       " '3(iku)': '๐’',\n",
       " '4(iku)': '๐’‚',\n",
       " '5(iku)': '๐’ƒ',\n",
       " '6(iku)': '๐’„',\n",
       " '7(iku)': '๐’…',\n",
       " '8(iku)': '๐’†',\n",
       " '9(iku)': '๐’‡',\n",
       " '3(esze3)': '๐’€ธ๐’Œ‹',\n",
       " 'gesz2': '๐’•',\n",
       " \"gesz'u\": '๐’ž',\n",
       " 'szar2': '๐’Šน'}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "MAPPING_SOLUTIONS = dict(\n",
    "  ah=('HIxNUN', 'U12134'),\n",
    "  AH=('HIxNUN', 'U12134'),\n",
    "  alamusz=('TAxHI', 'U122ED'),\n",
    "  babila2=('KA2.AN.RA', 'U1218D U1202D U1228F'),\n",
    "  dah=('MU/MU', 'U1222D'),\n",
    "  eh=('HIxNUN', 'U12134'),\n",
    "  EH=('HIxNUN', 'U12134'),\n",
    "  eri11=('AB gunรป', 'U12015'),\n",
    "  geszimmar=('ล A6', 'U122B7'),\n",
    "  gudu4=('HIxNUN.ME', 'U12134 U12228'),\n",
    "  had2=('UD', 'U12313'),\n",
    "  har=('HIxAล 2', 'U1212F'),\n",
    "  HAR=('HIxAล 2', 'U1212F'),\n",
    "  he=('HI', 'U1212D'),\n",
    "  he2=('GAN', 'U120F6'),\n",
    "  hun=('Eล 2', 'U120A0'),\n",
    "  hur=('HIxAล 2', 'U1212F'),\n",
    "  huz=('LUM', 'U1221D'),\n",
    "  ih=('HIxNUN', 'U12134'),\n",
    "  IH=('HIxNUN', 'U12134'),\n",
    "  itu=('UDxU.U.U', 'U12317'),\n",
    "  KA=('KA TA', 'U12157 U122EB'),\n",
    "  kislah=('KI.UD', 'U121A0 U12313'),\n",
    "  lah=('UD', 'U12313'),\n",
    "  lah4=('DU / DU', 'U1207B'),\n",
    "  lah5=('DU.DU', 'U1207A U1207A'),\n",
    "  lah6=('DU', 'U1207A'),\n",
    "  lal3=('TAxHI', 'U122ED'),\n",
    "  muhaldim=('MU', 'U1222C'),\n",
    "  nigar=('U.UD.KID', 'U1230B U12313 U121A4'),\n",
    "  nirah=('MUล ', 'U12232'),\n",
    "  sa10=('NINDA2xล E', 'U1225A'),\n",
    "  sahar=('Iล ', 'U12156'),\n",
    "  siskur2=('AMARxล E.AMARxล E', 'U1202C U1202C'),\n",
    "  szagina=('GIR3.ARAD', 'U1210A U12034'),\n",
    "  szah=('ล UBUR', 'U122DA'),\n",
    "  szah2=('DUN', 'U12084'),\n",
    "  szandana=('GAL.NI', 'U120F2 U1224C'),\n",
    "  tah=('MU/MU', 'U1222D'),\n",
    "  tap=('TAB', 'U122F0'),\n",
    "  udru=('Aล 2', 'U1203E'),\n",
    "  UH=('HIxNUN', 'U12134'),\n",
    "  uh=('HIxNUN', 'U12134'),\n",
    "  UH2=('UD.KUล U2', 'U12313 U121B5'),\n",
    "  uh2=('UD.KUล U2', 'U12313 U121B5'),\n",
    "  uh3=('KUล U2', 'U121B5'),\n",
    "  UH3=('KUล U2', 'U121B5'),\n",
    "  ukken=('URUxBAR', 'U1233A'),\n",
    "  unu=('AB gunรป', 'U12015'),\n",
    ")\n",
    "MAPPING_SOLUTIONS.update({\n",
    "  'barig': ('', 'U12079'),\n",
    "  '1(barig)': ('', 'U12079'),\n",
    "  '2(barig)': ('', 'U12079 U12079'),\n",
    "  '3(barig)': ('', 'U12079 U12079 U12079'),\n",
    "  '4(barig)': ('', 'U1235D'),\n",
    "  '5(barig)': ('', 'U12125'),\n",
    "  'bur3': ('', 'U1230B'),\n",
    "  \"bur'u\": ('', 'U12434'),\n",
    "  '1(bur3)': ('', 'U1230B'),\n",
    "  '2(bur3)': ('', 'U1230B U1230B'),\n",
    "  '3(bur3)': ('', 'U1230B U1230B U1230B'),\n",
    "  '4(bur3)': ('', 'U1240F'),\n",
    "  '5(bur3)': ('', 'U12410'),\n",
    "  '6(bur3)': ('', 'U12411'),\n",
    "  '7(bur3)': ('', 'U12412'),\n",
    "  '8(bur3)': ('', 'U12413'),\n",
    "  '9(bur3)': ('', 'U12414'),\n",
    "  '1/2(disz)': ('', 'U12226'),\n",
    "  '13(disz)': ('', 'U12399 U12408'),\n",
    "  '1(iku)': ('', 'U12038'),\n",
    "  '2(iku)': ('', 'U12400'),\n",
    "  '3(iku)': ('', 'U12401'),\n",
    "  '4(iku)': ('', 'U12402'),\n",
    "  '5(iku)': ('', 'U12403'),\n",
    "  '6(iku)': ('', 'U12404'),\n",
    "  '7(iku)': ('', 'U12405'),\n",
    "  '8(iku)': ('', 'U12406'),\n",
    "  '9(iku)': ('', 'U12407'),\n",
    "  '3(esze3)': ('', 'U12038 U1230B'),\n",
    "  'gesz2': ('', 'U12415'),\n",
    "  \"gesz'u\": ('', 'U1241E'),\n",
    "  'szar2': ('', 'U122B9'),\n",
    "})\n",
    "\n",
    "MAPPING_SOLUTIONSX = {}\n",
    "\n",
    "for (token, (grapheme, uniChars)) in MAPPING_SOLUTIONS.items():\n",
    "  uniStr = ''.join(chr(int(uc[1:], 16)) for uc in uniChars.split())\n",
    "  MAPPING_SOLUTIONSX[token] = uniStr\n",
    "MAPPING_SOLUTIONSX"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ambiguously mapped readings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 50 ambiguously mapped readings\n",
      "IA => (2) => ๐’…€ - ๐’‰ฟ\n",
      "IL => (2) => ๐’€ง - ๐’…‹\n",
      "IRI => (2) => ๐’…• - ๐’Œท\n",
      "KAM => (2) => ๐’„ญ๐’ - ๐’„ฐ\n",
      "LUM => (2) => ๐’ˆ - ๐’‹ž\n",
      "USZ => (2) => ๐’‘ - ๐’–\n",
      "UZ => (2) => ๐’Šป - ๐’–\n",
      "WA => (2) => ๐’€ - ๐’‰ฟ\n",
      "ba4 => (3) => ๐’€€๐’€ญ๐’‚ท - ๐’‚ท - ๐’๐’‚ท๐’‚ท\n",
      "ba6 => (2) => ๐’€๐’Œ‘ - ๐’Œ‘\n",
      "bara2 => (2) => ๐’ - ๐’ˆ\n",
      "bum => (2) => ๐’…ค - ๐’†ƒ\n",
      "buru14 => (2) => ๐’‚˜ - ๐’‚™\n",
      "dabin => (2) => ๐’‚ ๐’Šบ - ๐’ฅ๐’Šบ\n",
      "dilmun => (3) => ๐’‰Œ๐’Œ‡ - ๐’Šฉ๐’„ธ - ๐’Šฉ๐’Œ‡\n",
      "eri => (2) => ๐’…• - ๐’Œท\n",
      "erisz => (2) => ๐’Šฉ๐’ˆ  - ๐’Šฉ๐’Œ†\n",
      "gala => (3) => ๐’ƒฒ - ๐’‘๐’†ช - ๐’“\n",
      "gin7 => (2) => ๐’ถ - ๐’„€\n",
      "gurusz => (2) => ๐’„จ - ๐’†—\n",
      "ia => (2) => ๐’…€ - ๐’‰ฟ\n",
      "idim => (2) => ๐’ - ๐’…‚\n",
      "ii => (2) => ๐’…€ - ๐’‰ฟ\n",
      "il => (2) => ๐’€ง - ๐’…‹\n",
      "iri => (2) => ๐’…• - ๐’Œท\n",
      "isz8 => (2) => ๐’€น - ๐’Œ‹\n",
      "iu => (2) => ๐’…€ - ๐’‰ฟ\n",
      "kam => (2) => ๐’„ญ๐’ - ๐’„ฐ\n",
      "kesz2 => (2) => ๐’‚ก - ๐’†Ÿ\n",
      "kesz3 => (2) => ๐’‹™๐’€ญ๐’„ฒ - ๐’‹™๐’€ญ๐’„ฒ๐’† \n",
      "lum => (2) => ๐’ˆ - ๐’‹ž\n",
      "munu4 => (2) => ๐’‰ฝ๐’‰ฝ - ๐’‰ฝ๐’Šบ๐’‰ฝ\n",
      "ne3 => (2) => ๐’„Š - ๐’ŠŠ\n",
      "nergal => (2) => ๐’„Š๐’€•๐’ƒฒ - ๐’ŠŠ๐’€•๐’ƒฒ\n",
      "pa2 => (2) => ๐’€ - ๐’€๐’Œ‘\n",
      "pirig => (2) => ๐’„Š - ๐’ŠŠ\n",
      "puzur4 => (2) => ๐’…ค๐’Šญ - ๐’†ƒ๐’Šญ\n",
      "sig17 => (2) => ๐’„€ - ๐’†ฌ\n",
      "sze20 => (2) => ๐’‚  - ๐’…†\n",
      "t,a2 => (2) => ๐’‹ซ - ๐’‹ฌ\n",
      "til => (2) => ๐’ - ๐’Œ€\n",
      "us => (2) => ๐’Šป - ๐’–\n",
      "usa => (2) => ๐’ - ๐’‘„\n",
      "usz => (2) => ๐’‘ - ๐’–\n",
      "usz2 => (2) => ๐’ - ๐’—\n",
      "uz => (2) => ๐’Šป - ๐’–\n",
      "wa => (2) => ๐’€ - ๐’‰ฟ\n",
      "wa2 => (2) => ๐’€ - ๐’‰Œ\n",
      "zi3 => (2) => ๐’‚  - ๐’ฅ\n",
      "ziz2 => (2) => ๐’€พ - ๐’ฉ\n"
     ]
    }
   ],
   "source": [
    "print(f'{len(multiple):>3} ambiguously mapped readings')\n",
    "for r in sorted(multiple):\n",
    "  unis = multiple[r]\n",
    "  uniStr = ' - '.join(sorted(unis))\n",
    "  print(f'{r} => ({len(unis)}) => {uniStr}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Uniquely mapped readings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "768 uniquely mapped readings\n",
      "         A => ๐’€€\n",
      "        AB => ๐’€Š\n",
      "        AD => ๐’€œ\n",
      "        AG => ๐’€\n",
      "        AK => ๐’€\n",
      "        AL => ๐’€ \n",
      "        AM => ๐’„ \n",
      "        AN => ๐’€ญ\n",
      "        AR => ๐’…ˆ\n",
      "      ARAD => ๐’€ด\n",
      "     ARAD2 => ๐’€ต\n",
      "       AS, => ๐’Š\n",
      "       AS2 => ๐’€พ\n",
      "       ASZ => ๐’€ธ\n",
      "        AZ => ๐’Š\n",
      "        BA => ๐’€\n",
      "       BAD => ๐’\n",
      "       BAR => ๐’‡\n",
      "        BE => ๐’\n",
      "        BI => ๐’‰\n",
      "        BU => ๐’\n",
      "       BUR => ๐’“\n",
      "        DA => ๐’•\n",
      "       DAM => ๐’ฎ\n",
      "        DI => ๐’ฒ\n",
      "       DIM => ๐’ด\n",
      "       DIN => ๐’ท\n",
      "      DISZ => ๐’น\n",
      "        DU => ๐’บ\n",
      "         E => ๐’‚Š\n",
      "      EDIN => ๐’‚”\n",
      "        EK => ๐’……\n",
      "        EL => ๐’‚–\n",
      "        ER => ๐’…•\n",
      "        GA => ๐’‚ต\n",
      "       GAG => ๐’†•\n",
      "       GAL => ๐’ƒฒ\n",
      "      GAN2 => ๐’ƒท\n",
      "       GAR => ๐’ƒป\n",
      "       GAZ => ๐’„ค\n",
      "      GESZ => ๐’„‘\n",
      "        GI => ๐’„€\n",
      "       GIR => ๐’„ซ\n",
      "      GIR2 => ๐’„ˆ\n",
      "        GU => ๐’„–\n",
      "         I => ๐’„ฟ\n",
      "        IB => ๐’…\n",
      "        ID => ๐’€‰\n",
      "        IG => ๐’……\n",
      "        IK => ๐’……\n",
      "       IL2 => ๐’…\n",
      "        IM => ๐’…Ž\n",
      "        IN => ๐’…”\n",
      "        IR => ๐’…•\n",
      "       ISZ => ๐’…–\n",
      "        IZ => ๐’„‘\n",
      "        KA => ๐’…—\n",
      "       KAB => ๐’†\n",
      "        KI => ๐’† \n",
      "       KIB => ๐’„’\n",
      "        KU => ๐’†ช\n",
      "       KUM => ๐’„ฃ\n",
      "       KUR => ๐’†ณ\n",
      "        LA => ๐’†ท\n",
      "       LAM => ๐’‡ด\n",
      "        LE => ๐’‡ท\n",
      "        LI => ๐’‡ท\n",
      "        LU => ๐’‡ป\n",
      "       LU2 => ๐’‡ฝ\n",
      "        MA => ๐’ˆ \n",
      "        ME => ๐’ˆจ\n",
      "        MI => ๐’ˆช\n",
      "        NA => ๐’ˆพ\n",
      "       NAM => ๐’‰†\n",
      "        NE => ๐’‰ˆ\n",
      "        NI => ๐’‰Œ\n",
      "      NIG2 => ๐’ƒป\n",
      "       NIM => ๐’‰\n",
      "       NIN => ๐’Šฉ๐’Œ†\n",
      "        NU => ๐’‰ก\n",
      "       NUN => ๐’‰ฃ\n",
      "        PA => ๐’‰บ\n",
      "        PI => ๐’‰ฟ\n",
      "        RA => ๐’Š\n",
      "        RI => ๐’Š‘\n",
      "        RU => ๐’Š’\n",
      "       S,I => ๐’ข\n",
      "        SA => ๐’Š“\n",
      "       SAG => ๐’Š•\n",
      "       SAR => ๐’Šฌ\n",
      "       SIG => ๐’‹\n",
      "        SU => ๐’‹ข\n",
      "       SZA => ๐’Šญ\n",
      "       SZE => ๐’Šบ\n",
      "      SZE3 => ๐’‚ \n",
      "     SZESZ => ๐’‹€\n",
      "       SZI => ๐’…†\n",
      "      SZIM => ๐’‹†\n",
      "      SZIR => ๐’‹“\n",
      "       SZU => ๐’‹—\n",
      "        TA => ๐’‹ซ\n",
      "       TAB => ๐’‘Š\n",
      "       TAM => ๐’Œ“\n",
      "       TAR => ๐’‹ป\n",
      "        TE => ๐’‹ผ\n",
      "        TI => ๐’‹พ\n",
      "       TIM => ๐’ด\n",
      "        TU => ๐’Œ…\n",
      "      TUG2 => ๐’Œ†\n",
      "      TUL2 => ๐’‡ฅ\n",
      "       TUM => ๐’Œˆ\n",
      "       TUR => ๐’Œ‰\n",
      "        U2 => ๐’Œ‘\n",
      "        U3 => ๐’…‡\n",
      "        U4 => ๐’Œ“\n",
      "        UB => ๐’Œ’\n",
      "        UD => ๐’Œ“\n",
      "        UG => ๐’ŠŒ\n",
      "        UK => ๐’ŠŒ\n",
      "        UL => ๐’ŒŒ\n",
      "        UM => ๐’Œ\n",
      "        UR => ๐’Œจ\n",
      "        WE => ๐’‰ฟ\n",
      "        WI => ๐’‰ฟ\n",
      "        ZA => ๐’\n",
      "        ZE => ๐’ฃ\n",
      "        ZI => ๐’ฃ\n",
      "       ZI2 => ๐’ข\n",
      "        ZU => ๐’ช\n",
      "       ZUM => ๐’ฎ\n",
      "         a => ๐’€€\n",
      "        a2 => ๐’€‰\n",
      "        ab => ๐’€Š\n",
      "       ab2 => ๐’€–\n",
      "      abul => ๐’†๐’ƒฒ\n",
      "      abzu => ๐’ช๐’€Š\n",
      "        ad => ๐’€œ\n",
      "      adab => ๐’Œ“๐’‰ฃ\n",
      "        ag => ๐’€\n",
      "       ag2 => ๐’‰˜\n",
      "       aga => ๐’‚†\n",
      "     agrig => ๐’…†๐’พ\n",
      "        ak => ๐’€\n",
      "    akszak => ๐’Œ”\n",
      "        al => ๐’€ \n",
      "      alam => ๐’€ฉ\n",
      "      alan => ๐’€ฉ\n",
      "        am => ๐’„ \n",
      "       am3 => ๐’€€๐’€ญ\n",
      "       ama => ๐’‚ผ\n",
      "      amar => ๐’€ซ\n",
      "        an => ๐’€ญ\n",
      "     ansze => ๐’„\n",
      "        ap => ๐’€Š\n",
      "      apin => ๐’€ณ\n",
      "        aq => ๐’€\n",
      "        ar => ๐’…ˆ\n",
      "       ar3 => ๐’„ฏ\n",
      "        as => ๐’Š\n",
      "       as, => ๐’Š\n",
      "       as2 => ๐’€พ\n",
      "      asal => ๐’‚\n",
      "      asar => ๐’‚\n",
      "       asz => ๐’€ธ\n",
      "      asz2 => ๐’€พ\n",
      "     asza5 => ๐’ƒท\n",
      "    aszgab => ๐’€ฟ\n",
      "    asznan => ๐’Šบ๐’Œ\n",
      "        at => ๐’€œ\n",
      "       at, => ๐’€œ\n",
      "        az => ๐’Š\n",
      "       az2 => ๐’€พ\n",
      "       az3 => ๐’€ธ\n",
      "    azlag2 => ๐’Œ†\n",
      "        ba => ๐’€\n",
      "    babbar => ๐’Œ“\n",
      "      bad3 => ๐’‚ฆ\n",
      "       bal => ๐’„\n",
      "      bala => ๐’„\n",
      "       ban => ๐’‰ผ\n",
      "      ban2 => ๐’‘\n",
      "      ban3 => ๐’Œ‰\n",
      "    banda3 => ๐’Œ‰\n",
      "    banesz => ๐’‘‘\n",
      "   banszur => ๐’Ž\n",
      "    bappir => ๐’‹‹\n",
      "       bar => ๐’‡\n",
      "       bat => ๐’\n",
      "        be => ๐’\n",
      "       be2 => ๐’‰\n",
      "        bi => ๐’‰\n",
      "       bi2 => ๐’‰ˆ\n",
      "       bil => ๐’‰ˆ\n",
      "      bil2 => ๐’‰‹\n",
      "      bir2 => ๐’Œ“\n",
      "      bir4 => ๐’‚”\n",
      "      bisz => ๐’„ซ\n",
      "        bu => ๐’\n",
      "      bun2 => ๐’…ฎ\n",
      "       bur => ๐’“\n",
      "      bur3 => ๐’Œ‹\n",
      "   buranun => ๐’Œ“๐’„’๐’‰ฃ\n",
      "         d => ๐’€ญ\n",
      "        da => ๐’•\n",
      "       dab => ๐’ณ\n",
      "      dab5 => ๐’†ช\n",
      "       dag => ๐’–\n",
      "     dagal => ๐’‚ผ\n",
      "       dam => ๐’ฎ\n",
      "       dan => ๐’„จ\n",
      "       daq => ๐’–\n",
      "       dar => ๐’ฏ\n",
      "        de => ๐’ฒ\n",
      "       de3 => ๐’‰ˆ\n",
      "       de4 => ๐’‹ผ\n",
      "        di => ๐’ฒ\n",
      "       di2 => ๐’Šน\n",
      "       di3 => ๐’‹พ\n",
      "       dib => ๐’ณ\n",
      "      dida => ๐’‰๐’Œ‘๐’Š“\n",
      "     didli => ๐’€\n",
      "       dil => ๐’€ธ\n",
      "       dim => ๐’ด\n",
      "      dim2 => ๐’ถ\n",
      "      dim4 => ๐’‰ฝ๐’‰ฝ\n",
      "       din => ๐’ท\n",
      "    dingir => ๐’€ญ\n",
      "      diri => ๐’‹›๐’€€\n",
      "     dirig => ๐’‹›๐’€€\n",
      "      disz => ๐’น\n",
      "        du => ๐’บ\n",
      "      du10 => ๐’„ญ\n",
      "      du11 => ๐’…—\n",
      "       du3 => ๐’†•\n",
      "       du5 => ๐’‚…\n",
      "       du6 => ๐’‡ฏ\n",
      "       du7 => ๐’ŒŒ\n",
      "       du8 => ๐’‚ƒ\n",
      "       dub => ๐’พ\n",
      "       dug => ๐’‚\n",
      "      dug3 => ๐’„ญ\n",
      "      dul3 => ๐’Šจ\n",
      "      dul5 => ๐’Œ†\n",
      "      dumu => ๐’Œ‰\n",
      "     duru5 => ๐’€€\n",
      "      dusu => ๐’…\n",
      "     dusu2 => ๐’€ฒ๐’…‡\n",
      "         e => ๐’‚Š\n",
      "        e2 => ๐’‚\n",
      "        e3 => ๐’Œ“๐’บ\n",
      "        ea => ๐’€€\n",
      "        eb => ๐’…\n",
      "        ed => ๐’€‰\n",
      "      edin => ๐’‚”\n",
      "        eg => ๐’……\n",
      "      egir => ๐’‚•\n",
      "        ek => ๐’……\n",
      "        el => ๐’‚–\n",
      "       el2 => ๐’…‹\n",
      "       el3 => ๐’€ญ\n",
      "      elam => ๐’‰\n",
      "        em => ๐’…Ž\n",
      "       eme => ๐’…ด\n",
      "      eme6 => ๐’€ฒ๐’Šฉ\n",
      "        en => ๐’‚—\n",
      "       en6 => ๐’…”\n",
      "     engar => ๐’€ณ\n",
      "      enku => ๐’ ๐’„ฉ\n",
      "     ensi2 => ๐’‘๐’‹ผ๐’‹›\n",
      "        ep => ๐’…\n",
      "        eq => ๐’……\n",
      "        er => ๐’…•\n",
      "       er2 => ๐’€€๐’…†\n",
      "       er3 => ๐’€ด\n",
      "     eren2 => ๐’‚Ÿ\n",
      "    eresz2 => ๐’‰€\n",
      "      erim => ๐’‚Ÿ\n",
      "      erin => ๐’‚ž\n",
      "     erin2 => ๐’‚Ÿ\n",
      "        es => ๐’„‘\n",
      "       es, => ๐’„‘\n",
      "      esir => ๐’€€๐’‡’\n",
      "       esz => ๐’Œ\n",
      "     esz15 => ๐’…–\n",
      "     esz18 => ๐’€น\n",
      "      esz2 => ๐’‚ \n",
      "      esz3 => ๐’€Š\n",
      "      esza => ๐’€€๐’Œ\n",
      "     esze3 => ๐’‘˜\n",
      "        et => ๐’€‰\n",
      "       et, => ๐’€‰\n",
      "        ez => ๐’„‘\n",
      "      ezem => ๐’‚ก\n",
      "        ga => ๐’‚ต\n",
      "       ga2 => ๐’‚ท\n",
      "       gab => ๐’ƒฎ\n",
      "      gaba => ๐’ƒฎ\n",
      "      gada => ๐’ƒฐ\n",
      "       gag => ๐’†•\n",
      "       gal => ๐’ƒฒ\n",
      "      gal2 => ๐’……\n",
      "       gan => ๐’ƒถ\n",
      "      gan2 => ๐’ƒท\n",
      "     ganba => ๐’† ๐’‡ด\n",
      "       gar => ๐’ƒป\n",
      "      gar3 => ๐’ƒผ\n",
      "       gaz => ๐’„ค\n",
      "        ge => ๐’„€\n",
      "       ge6 => ๐’ˆช\n",
      "      geme => ๐’Šฉ\n",
      "     geme2 => ๐’Šฉ๐’†ณ\n",
      "      gesz => ๐’„‘\n",
      "   gesztin => ๐’ƒพ\n",
      "   gesztu2 => ๐’„‘๐’Œ†๐’‰ฟ\n",
      "        gi => ๐’„€\n",
      "       gi2 => ๐’†ค\n",
      "       gi4 => ๐’„„\n",
      "       gi7 => ๐’‚ \n",
      "     gibil => ๐’‰‹\n",
      "      gid2 => ๐’\n",
      "     gidri => ๐’‰บ\n",
      "     gigir => ๐’‡€\n",
      "       gim => ๐’ถ\n",
      "       gin => ๐’บ\n",
      "      gin2 => ๐’‚…\n",
      "       gir => ๐’„ซ\n",
      "     gir14 => ๐’„ฉ\n",
      "      gir2 => ๐’„ˆ\n",
      "      gir3 => ๐’„Š\n",
      "      gir8 => ๐’†ธ\n",
      "    giri17 => ๐’…—\n",
      "     giri3 => ๐’„Š\n",
      "     gissu => ๐’„‘๐’ˆช\n",
      "      gisz => ๐’„‘\n",
      "        gu => ๐’„–\n",
      "       gu2 => ๐’„˜\n",
      "       gu4 => ๐’„ž\n",
      "       gu7 => ๐’…ฅ\n",
      "       gub => ๐’บ\n",
      "       gud => ๐’„ž\n",
      "       gul => ๐’„ข\n",
      "      gum2 => ๐’ˆ\n",
      "       gur => ๐’„ฅ\n",
      "     gur10 => ๐’†ฅ\n",
      "     gur11 => ๐’‚ต\n",
      "      gur8 => ๐’‹ฝ\n",
      "     guru7 => ๐’„ฆ\n",
      "         i => ๐’„ฟ\n",
      "        i3 => ๐’‰Œ\n",
      "        i7 => ๐’€€๐’‡‰\n",
      "       ia2 => ๐’Š\n",
      "       ia3 => ๐’‰Œ\n",
      "        ib => ๐’…\n",
      "       ib2 => ๐’Œˆ\n",
      "     ibila => ๐’Œ‰๐’‘\n",
      "        id => ๐’€‰\n",
      "       id2 => ๐’€€๐’‡‰\n",
      "    idigna => ๐’ˆฆ๐’„˜๐’ƒผ\n",
      "        ig => ๐’……\n",
      "       igi => ๐’…†\n",
      "        ik => ๐’……\n",
      "       iku => ๐’ƒท\n",
      "       il2 => ๐’…\n",
      "       il3 => ๐’€ญ\n",
      "       il5 => ๐’‚–\n",
      "     illat => ๐’†œ๐’†ณ\n",
      "        im => ๐’…Ž\n",
      "      imin => ๐’‘‚\n",
      "     imma3 => ๐’…Š\n",
      "        in => ๐’…”\n",
      "       ina => ๐’€ธ\n",
      "    inanna => ๐’ˆน\n",
      "      inim => ๐’…—\n",
      "        ip => ๐’…\n",
      "        iq => ๐’……\n",
      "        ir => ๐’…•\n",
      "       ir3 => ๐’€ด\n",
      "        is => ๐’„‘\n",
      "       is, => ๐’„‘\n",
      "       is2 => ๐’…–\n",
      "       is3 => ๐’€Š\n",
      "       is4 => ๐’€พ\n",
      "       isz => ๐’…–\n",
      "      isz3 => ๐’Œ\n",
      "      isz7 => ๐’€Š\n",
      "    iszkur => ๐’…Ž\n",
      "  isztaran => ๐’…—๐’ฒ\n",
      "        it => ๐’€‰\n",
      "       it, => ๐’€‰\n",
      "       iti => ๐’Œš\n",
      "        iz => ๐’„‘\n",
      "        ka => ๐’…—\n",
      "       ka2 => ๐’†\n",
      "       ka3 => ๐’‚ต\n",
      "       ka9 => ๐’‹ƒ\n",
      "       kab => ๐’†\n",
      "       kak => ๐’†•\n",
      "       kal => ๐’„จ\n",
      "      kal2 => ๐’ƒฒ\n",
      "     kalag => ๐’„จ\n",
      "     kalam => ๐’Œฆ\n",
      "       kap => ๐’†\n",
      "       kar => ๐’‹ผ๐’€€\n",
      "      kar2 => ๐’ƒธ\n",
      "      kar3 => ๐’ƒผ\n",
      "      kas4 => ๐’ฝ\n",
      "    kaskal => ๐’†œ\n",
      "      kasz => ๐’‰\n",
      "        ke => ๐’† \n",
      "       ke4 => ๐’†ค\n",
      "        ki => ๐’† \n",
      "       ki2 => ๐’„€\n",
      "       kid => ๐’†ค\n",
      "   kikken2 => ๐’„ฏ๐’„ฏ\n",
      "     kilib => ๐’†ธ\n",
      "       kin => ๐’†ฅ\n",
      "       kir => ๐’„ซ\n",
      "     kiri6 => ๐’Šฌ\n",
      "      kisz => ๐’†ง\n",
      "    kiszib => ๐’ˆฉ\n",
      "   kiszib3 => ๐’พ\n",
      "        ku => ๐’†ช\n",
      "      ku13 => ๐’„ฃ\n",
      "       ku3 => ๐’†ฌ\n",
      "       ku4 => ๐’†ฎ\n",
      "       ku5 => ๐’‹ป\n",
      "       ku6 => ๐’„ฉ\n",
      "       kul => ๐’†ฐ\n",
      "       kum => ๐’„ฃ\n",
      "       kun => ๐’†ฒ\n",
      "      kup4 => ๐’†ค\n",
      "       kur => ๐’†ณ\n",
      "      kur2 => ๐’‰ฝ\n",
      "    kurun2 => ๐’ท\n",
      "  kuruszda => ๐’†ฏ\n",
      "      kusz => ๐’‹ข\n",
      "     kusz3 => ๐’Œ‘\n",
      "        la => ๐’†ท\n",
      "       la2 => ๐’‡ฒ\n",
      "    lagasz => ๐’‹“๐’“๐’†ท\n",
      "       lam => ๐’‡ด\n",
      "     lamma => ๐’„จ\n",
      "     larsa => ๐’Œ“๐’€•\n",
      "        le => ๐’‡ท\n",
      "       lem => ๐’…†\n",
      "        li => ๐’‡ท\n",
      "       li2 => ๐’‰Œ\n",
      "       li3 => ๐’…†\n",
      "     libir => ๐’…†๐’‚ \n",
      "       lik => ๐’Œจ\n",
      "      lil2 => ๐’†ค\n",
      "       lim => ๐’…†\n",
      "        lu => ๐’‡ป\n",
      "       lu2 => ๐’‡ฝ\n",
      "       lu4 => ๐’ˆ\n",
      "     lugal => ๐’ˆ—\n",
      "     lukur => ๐’Šฉ๐’ˆจ\n",
      "        ma => ๐’ˆ \n",
      "       ma2 => ๐’ˆฃ\n",
      "       mal => ๐’‚ท\n",
      "       man => ๐’Œ‹๐’Œ‹\n",
      "       mar => ๐’ˆฅ\n",
      "      mar2 => ๐’€ซ\n",
      "    marduk => ๐’€ซ๐’Œ“\n",
      "      masz => ๐’ˆฆ\n",
      "     masz2 => ๐’ˆง\n",
      "   maszkim => ๐’‘๐’ฝ\n",
      "        me => ๐’ˆจ\n",
      "       me2 => ๐’ˆช\n",
      "      mesz => ๐’ˆจ๐’Œ\n",
      "        mi => ๐’ˆช\n",
      "       mi2 => ๐’Šฉ\n",
      "       mi3 => ๐’ˆจ\n",
      "       mil => ๐’…–\n",
      "        mu => ๐’ˆฌ\n",
      "       mug => ๐’ˆฎ\n",
      "       mun => ๐’ต\n",
      "     munus => ๐’Šฉ\n",
      "       mur => ๐’„ฏ\n",
      "      musz => ๐’ˆฒ\n",
      "     musz5 => ๐’‹€\n",
      "    muszen => ๐’„ท\n",
      "        na => ๐’ˆพ\n",
      "       na4 => ๐’‰Œ๐’Œ“\n",
      "      nag2 => ๐’‰€\n",
      "     nagar => ๐’‰„\n",
      "     nagga => ๐’€ญ๐’ˆพ\n",
      "       nam => ๐’‰†\n",
      "     nanna => ๐’Œถ๐’† \n",
      "    nansze => ๐’€\n",
      "       nar => ๐’ˆœ\n",
      "        ne => ๐’‰ˆ\n",
      "       ne2 => ๐’‰Œ\n",
      "        ni => ๐’‰Œ\n",
      "     nibru => ๐’‚—๐’†ค\n",
      "    nidba2 => ๐’‰ป๐’ˆน\n",
      "      nig2 => ๐’ƒป\n",
      "    nigin3 => ๐’Œ‹๐’Œ“๐’†ค\n",
      "    nigin6 => ๐’€’\n",
      "       nim => ๐’‰\n",
      "    nimgir => ๐’‚†\n",
      "       nin => ๐’Šฉ๐’Œ†\n",
      "      nina => ๐’€\n",
      "     ninda => ๐’ƒป\n",
      "       nir => ๐’‰ช\n",
      "      nita => ๐’‘\n",
      "     nita2 => ๐’€ด\n",
      "        nu => ๐’‰ก\n",
      "       nu2 => ๐’ˆฟ\n",
      "       num => ๐’‰\n",
      "     numun => ๐’†ฐ\n",
      "       nun => ๐’‰ฃ\n",
      "     nunuz => ๐’‰ญ\n",
      "        pa => ๐’‰บ\n",
      "      pa12 => ๐’‰ฟ\n",
      "       pa3 => ๐’…†๐’Š’\n",
      "       pa4 => ๐’‰ฝ\n",
      "       pa5 => ๐’‰ฝ๐’‚Š\n",
      "       pal => ๐’„\n",
      "      par2 => ๐’‡\n",
      "        pe => ๐’‰ฟ\n",
      "       pe2 => ๐’‰\n",
      "      pesz => ๐’„ซ\n",
      "        pi => ๐’‰ฟ\n",
      "       pi2 => ๐’‰\n",
      "       pi4 => ๐’…—\n",
      "       pil => ๐’‰ˆ\n",
      "      pil2 => ๐’‰‹\n",
      "       pir => ๐’Œ“\n",
      "     pisan => ๐’‚ท\n",
      "      pisz => ๐’„ซ\n",
      "        pu => ๐’\n",
      "       pur => ๐’“\n",
      "     puzur => ๐’Œ‹\n",
      "        qa => ๐’‹ก\n",
      "       qa2 => ๐’‚ต\n",
      "       qa3 => ๐’…—\n",
      "      qal4 => ๐’„จ\n",
      "       qar => ๐’ƒผ\n",
      "      qar3 => ๐’ƒป\n",
      "        qe => ๐’†ฅ\n",
      "       qe2 => ๐’† \n",
      "       qe3 => ๐’„€\n",
      "        qi => ๐’†ฅ\n",
      "       qi2 => ๐’† \n",
      "       qi3 => ๐’„€\n",
      "       qi4 => ๐’„„\n",
      "       qir => ๐’„ซ\n",
      "        qu => ๐’„ฃ\n",
      "       qu2 => ๐’†ช\n",
      "       qu3 => ๐’„–\n",
      "       qum => ๐’„ฃ\n",
      "      qur2 => ๐’†ณ\n",
      "        ra => ๐’Š\n",
      "       ra2 => ๐’บ\n",
      "      rasz => ๐’†œ\n",
      "        re => ๐’Š‘\n",
      "        ri => ๐’Š‘\n",
      "       ri2 => ๐’Œท\n",
      "      rim5 => ๐’€ธ\n",
      "        ru => ๐’Š’\n",
      "       ru3 => ๐’€ธ\n",
      "       rum => ๐’€ธ\n",
      "       s,a => ๐’\n",
      "      s,ar => ๐’‡ก\n",
      "       s,e => ๐’ข\n",
      "      s,e2 => ๐’ฃ\n",
      "       s,i => ๐’ข\n",
      "      s,i2 => ๐’ฃ\n",
      "     s,il2 => ๐’ˆช\n",
      "      s,ir => ๐’ˆฒ\n",
      "       s,u => ๐’ฎ\n",
      "      s,u2 => ๐’ช\n",
      "      s,um => ๐’ฎ\n",
      "      s,ur => ๐’€ซ\n",
      "        sa => ๐’Š“\n",
      "      sa12 => ๐’Š•\n",
      "       sa2 => ๐’ฒ\n",
      "       sa3 => ๐’\n",
      "       sa6 => ๐’Šท\n",
      "       sag => ๐’Š•\n",
      "     sag11 => ๐’†ฅ\n",
      "      saga => ๐’Š•\n",
      "       sak => ๐’Š•\n",
      "       sal => ๐’Šฉ\n",
      "     sanga => ๐’‹ƒ\n",
      "       sar => ๐’Šฌ\n",
      "        se => ๐’‹›\n",
      "       se2 => ๐’ฃ\n",
      "       se3 => ๐’‹ง\n",
      "        si => ๐’‹›\n",
      "       si2 => ๐’ฃ\n",
      "       si3 => ๐’‹ง\n",
      "       si4 => ๐’‹œ\n",
      "       sig => ๐’‹\n",
      "      sig2 => ๐’‹ \n",
      "      sig4 => ๐’‹ž\n",
      "      siki => ๐’‹ \n",
      "     sikil => ๐’‚–\n",
      "      sila => ๐’‹ป\n",
      "     sila3 => ๐’‹ก\n",
      "     sila4 => ๐’ƒข\n",
      "     silig => ๐’‚\n",
      "     silim => ๐’ฒ\n",
      "       sim => ๐’‰†\n",
      "     simug => ๐’Œฃ\n",
      "       sin => ๐’Œ\n",
      "      sin2 => ๐’‰†\n",
      "      sipa => ๐’‰บ๐’‡ป\n",
      "     sipad => ๐’‰บ๐’‡ป\n",
      "      sir2 => ๐’\n",
      "      sir3 => ๐’‚ก\n",
      "        su => ๐’‹ข\n",
      "       su2 => ๐’ช\n",
      "       su3 => ๐’‹ค\n",
      "       su7 => ๐’‡ญ\n",
      "      suen => ๐’‚—๐’ช\n",
      "    sukkal => ๐’ˆ›\n",
      "       sum => ๐’‹ง\n",
      "      sum2 => ๐’ฎ\n",
      "      sun2 => ๐’„ข\n",
      "       sur => ๐’‹ฉ\n",
      "       sza => ๐’Šญ\n",
      "     sza13 => ๐’Šน\n",
      "      sza3 => ๐’Šฎ\n",
      "    szabra => ๐’‘๐’€ \n",
      "  szakkan2 => ๐’„Š\n",
      "      szam => ๐’Œ‘\n",
      "     szam3 => ๐’‰“\n",
      "      szar => ๐’Šฌ\n",
      "    szara2 => ๐’‡‹\n",
      "       sze => ๐’Šบ\n",
      "      sze3 => ๐’‚ \n",
      "      szen => ๐’Šฟ\n",
      "   szennur => ๐’„’\n",
      "     szesz => ๐’‹€\n",
      "       szi => ๐’…†\n",
      "      szi2 => ๐’‹›\n",
      "      szim => ๐’‹†\n",
      "    szinig => ๐’‹’\n",
      "    szitim => ๐’ถ\n",
      "       szu => ๐’‹—\n",
      "      szub => ๐’Š’\n",
      "    szubur => ๐’‹š\n",
      "     szuku => ๐’‰ป\n",
      "      szul => ๐’‚„\n",
      "      szum => ๐’‹ณ\n",
      "     szum2 => ๐’‹ง\n",
      "      szur => ๐’‹ฉ\n",
      "     szur4 => ๐’‡ณ๐’Šฌ\n",
      "    szusz3 => ๐’…–\n",
      "       t,a => ๐’•\n",
      "      t,a3 => ๐’„ญ\n",
      "      t,am => ๐’ฎ\n",
      "      t,ar => ๐’‹ป\n",
      "       t,e => ๐’ฒ\n",
      "      t,e4 => ๐’‹ผ\n",
      "      t,e6 => ๐’‹พ\n",
      "       t,i => ๐’ฒ\n",
      "      t,i3 => ๐’‹พ\n",
      "       t,u => ๐’‚…\n",
      "      t,u2 => ๐’Œ…\n",
      "      t,u3 => ๐’บ\n",
      "      t,ul => ๐’‡ฅ\n",
      "      t,um => ๐’Œˆ\n",
      "      t,up => ๐’พ\n",
      "        ta => ๐’‹ซ\n",
      "       ta2 => ๐’•\n",
      "       tab => ๐’‘Š\n",
      "       tak => ๐’‹ณ\n",
      "      tak2 => ๐’–\n",
      "      tak4 => ๐’‹บ\n",
      "     taka4 => ๐’‹บ\n",
      "       tal => ๐’Š‘\n",
      "       tam => ๐’Œ“\n",
      "      tam2 => ๐’ฎ\n",
      "       tar => ๐’‹ป\n",
      "      tar2 => ๐’ฏ\n",
      "  taskarin => ๐’Œ†\n",
      "      tasz => ๐’Œจ\n",
      "        te => ๐’‹ผ\n",
      "       te4 => ๐’‰ˆ\n",
      "       te9 => ๐’‹พ\n",
      "       tel => ๐’\n",
      "       ter => ๐’Œ\n",
      "        ti => ๐’‹พ\n",
      "       ti7 => ๐’‹ผ\n",
      "    tibira => ๐’๐’‰„\n",
      "       tim => ๐’ด\n",
      "       tir => ๐’Œ\n",
      "   tiszpak => ๐’ˆฝ\n",
      "        tu => ๐’Œ…\n",
      "       tu2 => ๐’Œ“\n",
      "       tu3 => ๐’บ\n",
      "      tug2 => ๐’Œ†\n",
      "     tukul => ๐’†ช\n",
      "       tul => ๐’Œ‹๐’Œ†\n",
      "      tul2 => ๐’‡ฅ\n",
      "       tum => ๐’Œˆ\n",
      "      tun3 => ๐’‚…\n",
      "       tup => ๐’พ\n",
      "       tur => ๐’Œ‰\n",
      "      tur2 => ๐’„™\n",
      "         u => ๐’Œ‹\n",
      "        u2 => ๐’Œ‘\n",
      "        u3 => ๐’…‡\n",
      "        u4 => ๐’Œ“\n",
      "        u8 => ๐’‡‡\n",
      "        ub => ๐’Œ’\n",
      "        ud => ๐’Œ“\n",
      "       ud5 => ๐’š\n",
      "       udu => ๐’‡ป\n",
      "        ug => ๐’ŠŒ\n",
      "       ug3 => ๐’Œฆ\n",
      "     ugula => ๐’‰บ\n",
      "        uk => ๐’ŠŒ\n",
      "        ul => ๐’ŒŒ\n",
      "        um => ๐’Œ\n",
      "     umbin => ๐’Œข\n",
      "      umma => ๐’„‘๐’†ต\n",
      "        un => ๐’Œฆ\n",
      "     unken => ๐’Œบ\n",
      "      unug => ๐’€•\n",
      "        up => ๐’Œ’\n",
      "        uq => ๐’ŠŒ\n",
      "        ur => ๐’Œจ\n",
      "       ur2 => ๐’Œซ\n",
      "       ur3 => ๐’ƒก\n",
      "       ur5 => ๐’„ฏ\n",
      "     urasz => ๐’…\n",
      "      uri2 => ๐’‹€๐’€•\n",
      "      urta => ๐’…\n",
      "       uru => ๐’Œท\n",
      "      uru4 => ๐’€ณ\n",
      "     uruda => ๐’\n",
      "     urudu => ๐’\n",
      "       us, => ๐’Šป\n",
      "      us,2 => ๐’‘\n",
      "      us,4 => ๐’Š\n",
      "       us2 => ๐’‘\n",
      "     usan3 => ๐’‰ฎ\n",
      "        ut => ๐’Œ“\n",
      "       ut, => ๐’Œ“\n",
      "       utu => ๐’Œ“\n",
      "       uz2 => ๐’‘\n",
      "       uzu => ๐’œ\n",
      "        we => ๐’‰ฟ\n",
      "        wi => ๐’‰ฟ\n",
      "        wu => ๐’‰ฟ\n",
      "        yi => ๐’‰ฟ\n",
      "        za => ๐’\n",
      "       za3 => ๐’ \n",
      "   zabala4 => ๐’ˆน๐’๐’€•\n",
      "     zabar => ๐’Œ“๐’…—๐’ˆฆ\n",
      "     zadim => ๐’ˆฏ\n",
      "       zal => ๐’‰Œ\n",
      "    zalag2 => ๐’‚Ÿ\n",
      "       zar => ๐’‡ก\n",
      "        ze => ๐’ฃ\n",
      "       ze2 => ๐’ข\n",
      "        zi => ๐’ฃ\n",
      "       zi2 => ๐’ข\n",
      "      zid2 => ๐’‚ \n",
      "    zimbir => ๐’Œ“๐’„’๐’‰ฃ\n",
      "      zir3 => ๐’ˆฒ\n",
      "        zu => ๐’ช\n",
      "       zu2 => ๐’…—\n",
      "       zum => ๐’ฎ\n"
     ]
    }
   ],
   "source": [
    "print(f'{len(unique):>3} uniquely mapped readings')\n",
    "for r in sorted(unique):\n",
    "  print(f'{r:>10} => {unique[r]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Write the mapping file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "964 entries written to /Users/dirk/github/Nino-cunei/oldbabylonian/characters/mapping.tsv\n"
     ]
    }
   ],
   "source": [
    "pairs = {}\n",
    "for (k, vs) in multiple.items():\n",
    "  pairs[k] = sorted(vs)[0]\n",
    "for (t, v) in mapAddition.items():\n",
    "  k = f'{t[0]}({t[1]})' if type(t) is tuple else t\n",
    "  pairs[k] = v\n",
    "for (k, v) in MAPPING_SOLUTIONSX.items():\n",
    "  pairs[k] = v\n",
    "for (k, v) in unique.items():\n",
    "  pairs[k] = v\n",
    "\n",
    "with open(MAPPING_FILE, 'w') as mf:\n",
    "  for (k,v) in sorted(pairs.items()):\n",
    "    mf.write(f'{k}\\t{v}\\n')\n",
    "print(f'{len(pairs)} entries written to {MAPPING_FILE}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

back to top

Software Heritage โ€” Copyright (C) 2015โ€“2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, APIโ€” Content policyโ€” Contactโ€” JavaScript license informationโ€” Web API