https://github.com/nino-cunei/oldbabylonian
Tip revision: 20173f788d445e60e4bc40891f2fee26044119a1 authored by Dirk Roorda on 28 March 2019, 07:29:10 UTC
docs analysis
docs analysis
Tip revision: 20173f7
mapReadings.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mapping Old Babylonian readings to Unicode\n",
"\n",
"## Task\n",
"\n",
"We want to map *readings* and *graphemes* in cuneiform corpora to cuneiform unicode characters,\n",
"based on extant mapping tables.\n",
"\n",
"We generate a plain mapping that can be used readily by programs that convert from ATF to TF or something else.\n",
"\n",
"## Problem\n",
"\n",
"There are multiple mapping tables, there are several ways to transliterate readings.\n",
"\n",
"## Sources\n",
"\n",
"We take the ATF transliterations from CDLI, for tablets found by a search on AbB and Old Babylonian.\n",
"\n",
"We take the file\n",
"[GeneratedSignList.json](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/GeneratedSignList.json)\n",
"with mappings like\n",
"\n",
"```json\n",
" \"BANIA\": {\n",
" \"signName\": \"BANIA\",\n",
" \"signNumber\": 551,\n",
" \"signCunei\": \"๐\",\n",
" \"codePoint\": \"\",\n",
" \"values\":\n",
"\t\t\t[\n",
" \"BANIA\", \"Aล 2.UoverU\", \"5SลชTU\"\n",
" ]\n",
" },\n",
" \"MA\": {\n",
" \"signName\": \"MA\",\n",
" \"signNumber\": 552,\n",
" \"signCunei\": \"๐ \",\n",
" \"codePoint\": \"\",\n",
" \"values\":\n",
"\t\t\t[\n",
" \"MA\", \"PEล 3\", \"PEล ล E\", \"WA6\"\n",
" ]\n",
" },\n",
"```\n",
"\n",
"See [transcription](https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md)\n",
"about the provenance of this file.\n",
"\n",
"# Status\n",
"\n",
"This is work in progress. \n",
"The mapping is needed in the conversion from ATF to TF in the program\n",
"[tfFromATF.py](tfFromATF.py).\n",
"\n",
"# Authors\n",
"\n",
"Cale Johnson, Martijn Kokken, Dirk Roorda\n",
"\n",
"# Acknowledgements\n",
"\n",
"We are indebted to **Auday Hussein** for helpfully sending *GeneratedSignList.json* file to us;\n",
"to **Alba de Ridder** for hints and comments."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import collections\n",
"import re\n",
"import json\n",
"from unicodedata import name as uname\n",
"\n",
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code\n",
"Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github\n"
]
},
{
"data": {
"text/html": [
"<b>Documentation:</b> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/\" title=\"provenance of Old Babylonian Letters 1900-1600: Cuneiform tablets \">OLDBABYLONIAN</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md\" title=\"How TF features represent ATF\">Character table</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"OLDBABYLONIAN feature documentation\">Feature docs</a> <a target=\"_blank\" href=\"https://github.com/annotation/app-oldbabylonian\" title=\"oldbabylonian API documentation\">oldbabylonian API</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/\" title=\"text-fabric-api\">Text-Fabric API 7.5.1</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Use/Search/\" title=\"Search Templates Introduction and Reference\">Search Reference</a><details open><summary><b>Loaded features</b>:</summary>\n",
"<p><b>Old Babylonian Letters 1900-1600: Cuneiform tablets </b>: <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ARK.tf\">ARK</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/after.tf\">after</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afterr.tf\">afterr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afteru.tf\">afteru</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atf.tf\">atf</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpost.tf\">atfpost</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpre.tf\">atfpre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/author.tf\">author</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/col.tf\">col</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collated.tf\">collated</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collection.tf\">collection</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/comment.tf\">comment</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/damage.tf\">damage</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/det.tf\">det</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnote.tf\">docnote</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnumber.tf\">docnumber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excavation.tf\">excavation</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excised.tf\">excised</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/face.tf\">face</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/flags.tf\">flags</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/fraction.tf\">fraction</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/genre.tf\">genre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/grapheme.tf\">grapheme</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemer.tf\">graphemer</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemeu.tf\">graphemeu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lang.tf\">lang</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/langalt.tf\">langalt</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ln.tf\">ln</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnc.tf\">lnc</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnno.tf\">lnno</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/material.tf\">material</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/missing.tf\">missing</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumcode.tf\">museumcode</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumname.tf\">museumname</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/object.tf\">object</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operator.tf\">operator</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatorr.tf\">operatorr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatoru.tf\">operatoru</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/otype.tf\">otype</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/period.tf\">period</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pnumber.tf\">pnumber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primecol.tf\">primecol</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primeln.tf\">primeln</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pubdate.tf\">pubdate</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/question.tf\">question</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/reading.tf\">reading</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingr.tf\">readingr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingu.tf\">readingu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarkable.tf\">remarkable</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarks.tf\">remarks</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/repeat.tf\">repeat</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLn.tf\">srcLn</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLnNum.tf\">srcLnNum</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcfile.tf\">srcfile</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/subgenre.tf\">subgenre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/supplied.tf\">supplied</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/sym.tf\">sym</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symr.tf\">symr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symu.tf\">symu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/trans.tf\">trans</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/transcriber.tf\">transcriber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/translation@en.tf\">translation@ll</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/type.tf\">type</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/uncertain.tf\">uncertain</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/volume.tf\">volume</a> <b><i><a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/oslots.tf\">oslots</a></i></b> </p></details>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<style>\n",
"@font-face {\n",
" font-family: \"Santakku\";\n",
" src:\n",
" local(\"Santakku.ttf\"),\n",
" url(\"https://github.com/annotation/text-fabric/blob/master/tf/server/static/fonts/Santakku.woff?raw=true\");\n",
"}\n",
".txtn,.txtn a:visited,.txtn a:link {\n",
" font-family: sans-serif;\n",
" font-size: normal;\n",
" text-decoration: none;\n",
"}\n",
".txtp,.txtp a:visited,.txtp a:link {\n",
" font-family: monospace;\n",
" font-size: normal;\n",
" text-decoration: none;\n",
"}\n",
".txtr,.txtr a:visited,.txtr a:link {\n",
" font-family: serif;\n",
" font-size: large;\n",
" text-decoration: none;\n",
"}\n",
".txtu,.txtu a:visited,.txtu a:link {\n",
" font-family: Santakku;\n",
" font-size: x-large;\n",
" text-decoration: none;\n",
"}\n",
".features {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" font-weight: bold;\n",
" color: #0a6611;\n",
" display: flex;\n",
" flex-flow: column nowrap;\n",
" padding: 0.1em;\n",
" margin: 0.1em;\n",
" direction: ltr;\n",
"}\n",
".features div,.features span {\n",
" padding: 0;\n",
" margin: -0.1rem 0;\n",
"}\n",
".features .f {\n",
" font-family: sans-serif;\n",
" font-size: x-small;\n",
" font-weight: normal;\n",
" color: #5555bb;\n",
"}\n",
".features .xft {\n",
" color: #000000;\n",
" background-color: #eeeeee;\n",
" font-size: medium;\n",
" margin: 0.1em 0em;\n",
"}\n",
".features .xft .f {\n",
" color: #000000;\n",
" background-color: #eeeeee;\n",
" font-style: italic;\n",
" font-size: small;\n",
" font-weight: normal;\n",
"}\n",
".pnum {\n",
" font-family: sans-serif;\n",
" font-size: small;\n",
" font-weight: bold;\n",
" color: #444444;\n",
"}\n",
".nd {\n",
" font-family: monospace;\n",
" font-size: x-small;\n",
" color: #999999;\n",
"}\n",
".meta {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: row nowrap;\n",
"}\n",
".features,.comments {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: column nowrap;\n",
"}\n",
".children {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" border: 0;\n",
" background-color: #ffffff;\n",
"}\n",
".children.document {\n",
" flex-flow: column nowrap;\n",
"}\n",
".children.face {\n",
" flex-flow: column nowrap;\n",
"}\n",
".children.line {\n",
" align-items: stretch;\n",
" flex-flow: row nowrap;\n",
"}\n",
".children.cluster {\n",
" flex-flow: row wrap;\n",
"}\n",
".children.line {\n",
" align-items: stretch;\n",
" flex-flow: row nowrap;\n",
"}\n",
".children.sign {\n",
" flex-flow: column nowrap;\n",
"}\n",
".contnr {\n",
" width: fit-content;\n",
"}\n",
".contnr.document,.contnr.face,\n",
".contnr.line,\n",
".contnr.cluster,\n",
".contnr.word,\n",
".contnr.sign {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: column nowrap;\n",
" background: #ffffff none repeat scroll 0 0;\n",
" padding: 0.5em 0.1em 0.1em 0.1em;\n",
" margin: 0.8em 0.1em 0.1em 0.1em;\n",
" border-radius: 0.2em;\n",
" border-style: solid;\n",
" border-width: 0.2em;\n",
" font-size: small;\n",
"}\n",
".contnr.document,.contnr.face {\n",
" border-color: #bb8800;\n",
"}\n",
".contnr.line {\n",
" border-color: #0088bb;\n",
"}\n",
".contnr.cluster {\n",
" flex-flow: row wrap;\n",
" border: 0;\n",
"}\n",
".contnr.word {\n",
" border-color: #44bbff;\n",
"}\n",
".contnr.sign {\n",
" border-color: #bbbbbb;\n",
"}\n",
".contnr.hl {\n",
" background-color: #ffee66;\n",
"}\n",
".lbl.document,.lbl.face,\n",
".lbl.line,\n",
".lbl.cluster,\n",
".lbl.sign,.lbl.word {\n",
" margin-top: -1.2em;\n",
" margin-left: 1em;\n",
" background: #ffffff none repeat scroll 0 0;\n",
" padding: 0 0.3em;\n",
" border-style: solid;\n",
" font-size: small;\n",
" display: block;\n",
"}\n",
".lbl.document,.lbl.face {\n",
" border-color: #bb8800;\n",
" border-width: 0.3em;\n",
" border-radius: 0.3em;\n",
" color: #bb8800;\n",
"}\n",
".lbl.line {\n",
" border-color: #0088bb;\n",
" border-width: 0.3em;\n",
" border-radius: 0.3em;\n",
" color: #0088bb;\n",
"}\n",
".lbl.cluster {\n",
" border-color: #dddddd;\n",
" border-width: 0.2em;\n",
" border-radius: 0.2em;\n",
" color: #0000cc;\n",
"}\n",
".lbl.word {\n",
" border-color: #44bbff;\n",
" border-width: 0.2em;\n",
" border-radius: 0.2em;\n",
" font-size: medium;\n",
" color: #000000;\n",
"}\n",
".lbl.sign {\n",
" border-color: #bbbbbb;\n",
" border-width: 0.1em;\n",
" border-radius: 0.1em;\n",
" font-size: small;\n",
" color: #000000;\n",
"}\n",
".op {\n",
" padding: 0.5em 0.1em 0.1em 0.1em;\n",
" margin: 0.8em 0.1em 0.1em 0.1em;\n",
" font-family: monospace;\n",
" font-size: x-large;\n",
" font-weight: bold;\n",
"}\n",
".name {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" color: #0000bb;\n",
"}\n",
".period {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" font-weight: bold;\n",
" color: #0000bb;\n",
"}\n",
".text {\n",
" font-family: sans-serif;\n",
" font-size: x-small;\n",
" color: #000000;\n",
"}\n",
".srcln {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" color: #000000;\n",
"}\n",
".srclnnum {\n",
" font-family: monospace;\n",
" font-size: x-small;\n",
" color: #0000bb;\n",
"}\n",
".comment {\n",
" color: #7777dd;\n",
" font-family: monospace;\n",
" font-size: small;\n",
"}\n",
".operator {\n",
" color: #ff77ff;\n",
" font-size: large;\n",
"}\n",
"/* LANGUAGE: superscript and subscript */\n",
"\n",
"/* cluster */\n",
".det {\n",
" vertical-align: super;\n",
"}\n",
"/* cluster */\n",
".langalt {\n",
" vertical-align: sub;\n",
"}\n",
"/* REDACTIONAL: line over or under */\n",
"\n",
"/* flag */\n",
".collated {\n",
" font-weight: bold;\n",
" text-decoration: underline;\n",
"}\n",
"/* cluster */\n",
".excised {\n",
" color: #dd0000;\n",
" text-decoration: line-through;\n",
"}\n",
"/* cluster */\n",
".supplied {\n",
" color: #0000ff;\n",
" text-decoration: overline;\n",
"}\n",
"/* flag */\n",
".remarkable {\n",
" font-weight: bold;\n",
" text-decoration: overline;\n",
"}\n",
"\n",
"/* UNSURE: italic*/\n",
"\n",
"/* cluster */\n",
".uncertain {\n",
" font-style: italic\n",
"}\n",
"/* flag */\n",
".question {\n",
" font-weight: bold;\n",
" font-style: italic\n",
"}\n",
"\n",
"/* BROKEN: text-shadow */\n",
"\n",
"/* cluster */\n",
".missing {\n",
" color: #999999;\n",
" text-shadow: #bbbbbb 1px 1px;\n",
"}\n",
"/* flag */\n",
".damage {\n",
" font-weight: bold;\n",
" color: #999999;\n",
" text-shadow: #bbbbbb 1px 1px;\n",
"}\n",
".empty {\n",
" color: #ff0000;\n",
"}\n",
"\n",
"\n",
"tr.tf, td.tf, th.tf {\n",
" text-align: left;\n",
"}\n",
"\n",
"span.hldot {\n",
"\tbackground-color: var(--hl-strong);\n",
"\tborder: 0.2rem solid var(--hl-rim);\n",
"\tborder-radius: 0.4rem;\n",
"\t/*\n",
"\tdisplay: inline-block;\n",
"\twidth: 0.8rem;\n",
"\theight: 0.8rem;\n",
"\t*/\n",
"}\n",
"span.hl {\n",
"\tbackground-color: var(--hl-strong);\n",
"\tborder-width: 0;\n",
"\tborder-radius: 0.1rem;\n",
"\tborder-style: solid;\n",
"}\n",
"\n",
"span.hlup {\n",
"\tborder-color: var(--hl-dark);\n",
"\tborder-width: 0.1rem;\n",
"\tborder-style: solid;\n",
"\tborder-radius: 0.2rem;\n",
" padding: 0.2rem;\n",
"}\n",
"\n",
":root {\n",
"\t--hl-strong: hsla( 60, 100%, 70%, 0.9 );\n",
"\t--hl-rim: hsla( 55, 100%, 60%, 0.9 );\n",
"\t--hl-dark: hsla( 55, 100%, 40%, 0.9 );\n",
"}\n",
"</style>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<details open><summary><b>API members</b>:</summary>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">C Computed</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Call AllComputeds</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Cs ComputedString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">E Edge</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Eall AllEdges</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Es EdgeString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ensureLoaded</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">TF</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ignored</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">loadLog</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Locality/#locality\" title=\"doc\">L Locality</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">cache</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">error</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">indent</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">info</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">reset</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">N Nodes</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKey</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKeyTuple</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">otypeRank</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortNodes</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">F Feature</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fall AllFeatures</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fs FeatureString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Search/#search\" title=\"doc\">S Search</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Text/#text\" title=\"doc\">T Text</a></details>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A = use('oldbabylonian', hoist=globals(), lgc=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local topography"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"BASE = os.path.expanduser('~/github')\n",
"ORG = 'Nino-cunei'\n",
"REPO = 'oldbabylonian'\n",
"\n",
"REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n",
"\n",
"WRITING_DIR = f'{REPO_DIR}/sources/writing'\n",
"\n",
"SIGN_FILE = 'GeneratedSignList.json'\n",
"SIGN_PATH = f'{WRITING_DIR}/{SIGN_FILE}'\n",
"\n",
"MAPPING_FILE = f'{os.path.abspath(\"..\")}/characters/mapping.tsv'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading collection\n",
"\n",
"We use TF to collect all readings from the corpus in a set."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"969 different tokens in corpus\n"
]
}
],
"source": [
"READABLE_TYPES = {'reading', 'grapheme', 'numeral', 'complex'}\n",
"\n",
"tokens = set()\n",
"\n",
"for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ not in READABLE_TYPES:\n",
" continue\n",
" reading = F.reading.v(s)\n",
" if typ == 'numeral':\n",
" repeat = F.repeat.v(s)\n",
" fraction = F.fraction.v(s)\n",
" if repeat:\n",
" if repeat > 0:\n",
" tokens.add((repeat, reading))\n",
" else:\n",
" tokens.add(reading)\n",
" else:\n",
" tokens.add((fraction, reading))\n",
" continue\n",
" for token in (F.reading.v(s), F.grapheme.v(s)):\n",
" if token:\n",
" tokens.add(token)\n",
"\n",
"print(f'{len(tokens)} different tokens in corpus')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unicode style versus ATF style\n",
"\n",
"We use mappings between Unicode style transliterations and ATF."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"transAscii = {\n",
" 'ลก': 'sz',\n",
" 'แนฃ': 's,',\n",
" 'ล': \"s'\",\n",
" 'แนญ': 't,',\n",
" 'แธซ': 'h,',\n",
"}\n",
"\n",
"transAscii.update({k.upper(): v.upper() for (k, v) in transAscii.items()})\n",
"\n",
"def makeAscii(r):\n",
" for (rin, rout) in transAscii.items():\n",
" r = r.replace(rin, rout)\n",
" return r"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ลก': 'sz',\n",
" 'แนฃ': 's,',\n",
" 'ล': \"s'\",\n",
" 'แนญ': 't,',\n",
" 'แธซ': 'h,',\n",
" 'ล ': 'SZ',\n",
" 'แนข': 'S,',\n",
" 'ล': \"S'\",\n",
" 'แนฌ': 'T,',\n",
" 'แธช': 'H,'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transAscii"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"REPEAT_INV = dict(\n",
" one=1,\n",
" two=2,\n",
" three=3,\n",
" four=4,\n",
" five=5,\n",
" six=6,\n",
" seven=7,\n",
" eight=8,\n",
" nine=9,\n",
")\n",
"\n",
"REPEAT = {v: k for (k, v) in REPEAT_INV.items()}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"FRACTION = {\n",
" '1/2': 'one half',\n",
" '1/3': 'one third',\n",
" '2/3': 'two thirds',\n",
" '1/4': 'one quarter',\n",
" '1/6': 'one sixth',\n",
" '5/6': 'five sixths',\n",
" '1/8': 'one eighth',\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read the sign list\n",
"\n",
"We read the json file with generated signs.\n",
"\n",
"For each sign, we find a list of *values*.\n",
"\n",
"These values correspond to possible readings or graphemes, in short, *tokens*. \n",
"They are in unicode transliteration style.\n",
"\n",
"In the mapping we create, we convert them to plain ATF,\n",
"which makes it easier to look them up from our Old Babylonian corpus."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1768 signs in the json file\n",
"8765 distinct values in table\n"
]
}
],
"source": [
"with open(SIGN_PATH) as fh:\n",
" signs = json.load(fh)['signs']\n",
"\n",
"print(f'{len(signs)} signs in the json file')\n",
"\n",
"mapping = collections.defaultdict(set)\n",
"\n",
"for (sign, signData) in signs.items():\n",
" uniStr = signData['signCunei']\n",
" values = signData['values']\n",
" for value in values:\n",
" valueAscii = makeAscii(value)\n",
" mapping[valueAscii].add(uniStr)\n",
"\n",
"print(f'{len(mapping)} distinct values in table')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Token lookup\n",
"\n",
"We look up each Old Babylonian token in the mapping just constructed.\n",
"\n",
"Depending on whether we find 0, 1 or multiple values, we store them in dictionaries\n",
"`unmapped`, `unique`, `multiple`."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"151 unmapped tokens\n",
" 50 ambiguously mapped tokens\n",
"768 uniquely mapped tokens\n"
]
}
],
"source": [
"MAPPING_FIXES = {\n",
" 'd': 'dingir',\n",
"}\n",
"\n",
"unmapped = set()\n",
"unique = {}\n",
"multiple = {}\n",
"\n",
"for t in tokens:\n",
" if type(t) is tuple:\n",
" unmapped.add(t)\n",
" continue\n",
" tLookup = MAPPING_FIXES.get(t, t)\n",
" tU = tLookup.upper()\n",
" if tU not in mapping:\n",
" unmapped.add(t)\n",
" continue\n",
" targets = mapping[tU]\n",
" if len(targets) == 1:\n",
" unique[t] = list(targets)[0]\n",
" else:\n",
" multiple[t] = targets\n",
" \n",
"print(f'{len(unmapped):>3} unmapped tokens')\n",
"print(f'{len(multiple):>3} ambiguously mapped tokens')\n",
"print(f'{len(unique):>3} uniquely mapped tokens')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unmapped tokens"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"151 unmapped tokens\n"
]
},
{
"data": {
"text/plain": [
"[\"'i\",\n",
" 'ah',\n",
" 'AH',\n",
" 'alamusz',\n",
" 'asal2',\n",
" (1, 'asz'),\n",
" (2, 'asz'),\n",
" (3, 'asz'),\n",
" (4, 'asz'),\n",
" (5, 'asz'),\n",
" (6, 'asz'),\n",
" (7, 'asz'),\n",
" (8, 'asz'),\n",
" (9, 'asz'),\n",
" 'babila2',\n",
" (1, 'ban2'),\n",
" (2, 'ban2'),\n",
" (3, 'ban2'),\n",
" (4, 'ban2'),\n",
" (5, 'ban2'),\n",
" 'barig',\n",
" (1, 'barig'),\n",
" (2, 'barig'),\n",
" (3, 'barig'),\n",
" (4, 'barig'),\n",
" (5, 'barig'),\n",
" (1, \"bur'u\"),\n",
" (2, \"bur'u\"),\n",
" (3, \"bur'u\"),\n",
" (4, \"bur'u\"),\n",
" (5, \"bur'u\"),\n",
" (1, 'bur3'),\n",
" (2, 'bur3'),\n",
" (3, 'bur3'),\n",
" (4, 'bur3'),\n",
" (5, 'bur3'),\n",
" (6, 'bur3'),\n",
" (8, 'bur3'),\n",
" (9, 'bur3'),\n",
" 'dah',\n",
" (1, 'disz'),\n",
" ('1/2', 'disz'),\n",
" ('1/3', 'disz'),\n",
" (2, 'disz'),\n",
" ('2/3', 'disz'),\n",
" (3, 'disz'),\n",
" (4, 'disz'),\n",
" (5, 'disz'),\n",
" ('5/6', 'disz'),\n",
" (6, 'disz'),\n",
" (7, 'disz'),\n",
" (8, 'disz'),\n",
" (9, 'disz'),\n",
" 'duh',\n",
" 'EH',\n",
" 'eh',\n",
" 'eri11',\n",
" (1, 'esze3'),\n",
" (2, 'esze3'),\n",
" (3, 'esze3'),\n",
" (1, 'gesz'),\n",
" (9, 'gesz'),\n",
" (1, \"gesz'u\"),\n",
" (2, \"gesz'u\"),\n",
" (3, \"gesz'u\"),\n",
" (4, \"gesz'u\"),\n",
" (7, \"gesz'u\"),\n",
" (1, 'gesz2'),\n",
" (2, 'gesz2'),\n",
" (3, 'gesz2'),\n",
" (4, 'gesz2'),\n",
" (5, 'gesz2'),\n",
" (6, 'gesz2'),\n",
" (7, 'gesz2'),\n",
" (8, 'gesz2'),\n",
" (9, 'gesz2'),\n",
" 'geszimmar',\n",
" (2, 'gisz'),\n",
" 'gudu4',\n",
" 'HA',\n",
" 'ha',\n",
" 'had2',\n",
" 'hal',\n",
" 'har',\n",
" 'HAR',\n",
" 'he',\n",
" 'he2',\n",
" 'HI',\n",
" 'hi',\n",
" 'hu',\n",
" 'HU',\n",
" 'hub2',\n",
" 'hun',\n",
" 'hur',\n",
" 'huz',\n",
" 'ih',\n",
" 'IH',\n",
" (1, 'iku'),\n",
" ('1/2', 'iku'),\n",
" (2, 'iku'),\n",
" (3, 'iku'),\n",
" (4, 'iku'),\n",
" 'itu',\n",
" 'kislah',\n",
" 'lah',\n",
" 'lah4',\n",
" 'lah5',\n",
" 'lah6',\n",
" 'lal3',\n",
" 'm',\n",
" 'mah',\n",
" 'muhaldim',\n",
" 'nigar',\n",
" 'nirah',\n",
" 'p',\n",
" 'pesz2',\n",
" 'sa10',\n",
" 'sahar',\n",
" 'siskur2',\n",
" 'sz',\n",
" 'szagina',\n",
" 'szah',\n",
" 'szah2',\n",
" 'szandana',\n",
" (1, 'szar2'),\n",
" (2, 'szar2'),\n",
" 'sze9',\n",
" 'szii',\n",
" 'szunigin',\n",
" 'tah',\n",
" 'tap',\n",
" (1, 'u'),\n",
" (2, 'u'),\n",
" (3, 'u'),\n",
" (4, 'u'),\n",
" (5, 'u'),\n",
" 'udru',\n",
" 'uh',\n",
" 'UH',\n",
" 'UH2',\n",
" 'uh2',\n",
" 'UH3',\n",
" 'uh3',\n",
" 'ukken',\n",
" 'umi',\n",
" 'unu',\n",
" 'ura',\n",
" '|A.GAB.LISZ|',\n",
" '|KA.TA|',\n",
" '|UD.KIB.NU|',\n",
" '|UD.KIB|']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unkey = lambda x: (x[1].lower(), str(x[0])) if type(x) is tuple else (x.lower(), '')\n",
"\n",
"print(f'{len(unmapped):>3} unmapped tokens')\n",
"sorted(unmapped, key=unkey)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fix the unmapped tokens\n",
"\n",
"We look up the unmapped tokens in the unicode table."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"cuneiBlocks = {\n",
" 'Cuneiform': ('12000', '123FF'),\n",
" 'Cuneiform Numbers and Punctuation': ('12400', '1247F'),\n",
" 'Early Dynastic Cuneiform': ('12480', '1254F'),\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"cunicode = {}\n",
"\n",
"for (block, (start, end)) in cuneiBlocks.items():\n",
" for u in range(int(start, 16), int(end, 16) + 1):\n",
" c = chr(u)\n",
" name = uname(c, None)\n",
" if name is None:\n",
" continue\n",
" cunicode[name] = c"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fixed 67 out of 151\n",
"FIXED\n",
"\tasal2 => ๐ท\n",
"\t(1, 'asz') => ๐ธ\n",
"\t(1, 'disz') => ๐น\n",
"\t(2, 'disz') => ๐น\n",
"\tduh => ๐\n",
"\t(2, 'gisz') => ๐\n",
"\tHA => ๐ฉ\n",
"\tha => ๐ฉ\n",
"\thal => ๐ฌ\n",
"\tHI => ๐ญ\n",
"\thi => ๐ญ\n",
"\tHU => ๐ท\n",
"\thu => ๐ท\n",
"\thub2 => ๐ธ\n",
"\t'i => ๐ฟ\n",
"\tmah => ๐ค\n",
"\tpesz2 => ๐พ\n",
"\t(1, 'szar2') => ๐น\n",
"\t(1, 'u') => ๐\n",
"\t(2, 'u') => ๐\n",
"\t(3, 'u') => ๐\n",
"\t(2, 'asz') => ๐\n",
"\t(3, 'asz') => ๐\n",
"\t(4, 'asz') => ๐\n",
"\t(5, 'asz') => ๐\n",
"\t(6, 'asz') => ๐\n",
"\t(7, 'asz') => ๐
\n",
"\t(8, 'asz') => ๐\n",
"\t(9, 'asz') => ๐\n",
"\t(3, 'disz') => ๐\n",
"\t(4, 'disz') => ๐\n",
"\t(5, 'disz') => ๐\n",
"\t(6, 'disz') => ๐\n",
"\t(7, 'disz') => ๐\n",
"\t(8, 'disz') => ๐\n",
"\t(9, 'disz') => ๐\n",
"\t(4, 'u') => ๐\n",
"\t(5, 'u') => ๐\n",
"\t(1, 'gesz2') => ๐\n",
"\t(2, 'gesz2') => ๐\n",
"\t(3, 'gesz2') => ๐\n",
"\t(4, 'gesz2') => ๐\n",
"\t(5, 'gesz2') => ๐\n",
"\t(6, 'gesz2') => ๐\n",
"\t(7, 'gesz2') => ๐\n",
"\t(8, 'gesz2') => ๐\n",
"\t(9, 'gesz2') => ๐\n",
"\t(1, \"gesz'u\") => ๐\n",
"\t(2, \"gesz'u\") => ๐\n",
"\t(3, \"gesz'u\") => ๐ \n",
"\t(4, \"gesz'u\") => ๐ก\n",
"\t(2, 'szar2') => ๐ฃ\n",
"\t(1, \"bur'u\") => ๐ด\n",
"\t(2, \"bur'u\") => ๐ต\n",
"\t(3, \"bur'u\") => ๐ถ\n",
"\t(4, \"bur'u\") => ๐ธ\n",
"\t(5, \"bur'u\") => ๐น\n",
"\t(1, 'ban2') => ๐\n",
"\t(2, 'ban2') => ๐\n",
"\t(3, 'ban2') => ๐\n",
"\t(4, 'ban2') => ๐\n",
"\t(5, 'ban2') => ๐\n",
"\t(1, 'esze3') => ๐\n",
"\t(2, 'esze3') => ๐\n",
"\t('1/3', 'disz') => ๐\n",
"\t('2/3', 'disz') => ๐\n",
"\t('5/6', 'disz') => ๐\n",
"UNFIXED\n",
"\tah => ?\n",
"\tAH => ?\n",
"\talamusz => ?\n",
"\tbabila2 => ?\n",
"\tbarig => ?\n",
"\t(1, 'barig') => ?\n",
"\t(2, 'barig') => ?\n",
"\t(3, 'barig') => ?\n",
"\t(4, 'barig') => ?\n",
"\t(5, 'barig') => ?\n",
"\t(1, 'bur3') => ?\n",
"\t(2, 'bur3') => ?\n",
"\t(3, 'bur3') => ?\n",
"\t(4, 'bur3') => ?\n",
"\t(5, 'bur3') => ?\n",
"\t(6, 'bur3') => ?\n",
"\t(8, 'bur3') => ?\n",
"\t(9, 'bur3') => ?\n",
"\tdah => ?\n",
"\t('1/2', 'disz') => ?\n",
"\tEH => ?\n",
"\teh => ?\n",
"\teri11 => ?\n",
"\t(3, 'esze3') => ?\n",
"\t(1, 'gesz') => ?\n",
"\t(9, 'gesz') => ?\n",
"\t(7, \"gesz'u\") => ?\n",
"\tgeszimmar => ?\n",
"\tgudu4 => ?\n",
"\thad2 => ?\n",
"\thar => ?\n",
"\tHAR => ?\n",
"\the => ?\n",
"\the2 => ?\n",
"\thun => ?\n",
"\thur => ?\n",
"\thuz => ?\n",
"\tih => ?\n",
"\tIH => ?\n",
"\t(1, 'iku') => ?\n",
"\t('1/2', 'iku') => ?\n",
"\t(2, 'iku') => ?\n",
"\t(3, 'iku') => ?\n",
"\t(4, 'iku') => ?\n",
"\titu => ?\n",
"\tkislah => ?\n",
"\tlah => ?\n",
"\tlah4 => ?\n",
"\tlah5 => ?\n",
"\tlah6 => ?\n",
"\tlal3 => ?\n",
"\tm => ?\n",
"\tmuhaldim => ?\n",
"\tnigar => ?\n",
"\tnirah => ?\n",
"\tp => ?\n",
"\tsa10 => ?\n",
"\tsahar => ?\n",
"\tsiskur2 => ?\n",
"\tsz => ?\n",
"\tszagina => ?\n",
"\tszah => ?\n",
"\tszah2 => ?\n",
"\tszandana => ?\n",
"\tsze9 => ?\n",
"\tszii => ?\n",
"\tszunigin => ?\n",
"\ttah => ?\n",
"\ttap => ?\n",
"\tudru => ?\n",
"\tuh => ?\n",
"\tUH => ?\n",
"\tUH2 => ?\n",
"\tuh2 => ?\n",
"\tUH3 => ?\n",
"\tuh3 => ?\n",
"\tukken => ?\n",
"\tumi => ?\n",
"\tunu => ?\n",
"\tura => ?\n",
"\t|A.GAB.LISZ| => ?\n",
"\t|KA.TA| => ?\n",
"\t|UD.KIB.NU| => ?\n",
"\t|UD.KIB| => ?\n"
]
}
],
"source": [
"mapAddition = {}\n",
"notFixed = set()\n",
"\n",
"def getLookup(r):\n",
" return (\n",
" r.\n",
" replace(\"'\", '').\n",
" upper().\n",
" replace(\"SZ\", 'SH').\n",
" replace('.', ' TIMES ')\n",
" )\n",
" \n",
" \n",
"for t in sorted(unmapped, key=unkey):\n",
" if type(t) is tuple:\n",
" if type(t[0]) is int:\n",
" (repeat, r) = t\n",
" tRepeat = REPEAT.get(repeat, None)\n",
" if tRepeat is None:\n",
" notFixed.add(t)\n",
" continue\n",
" tLookup = getLookup(r)\n",
" name = f'CUNEIFORM NUMERIC SIGN {tRepeat.upper()} {tLookup}'\n",
" c = cunicode.get(name, None)\n",
" if c is not None:\n",
" mapAddition[t] = c\n",
" continue\n",
" name = f'CUNEIFORM SIGN {tLookup}'\n",
" else:\n",
" (fraction, r) = t\n",
" tFraction = FRACTION.get(fraction, None)\n",
" if tFraction is None:\n",
" notFixed.add(t)\n",
" continue\n",
" tLookup = getLookup(r)\n",
" name = f'CUNEIFORM NUMERIC SIGN {tFraction.upper()} {tLookup}'\n",
" else:\n",
" tLookup = getLookup(t)\n",
" name = f'CUNEIFORM SIGN {tLookup}'\n",
" c = cunicode.get(name, None)\n",
" if c is None:\n",
" notFixed.add(t)\n",
" else:\n",
" mapAddition[t] = c\n",
"\n",
"print(f'fixed {len(mapAddition)} out of {len(unmapped)}')\n",
"\n",
"if mapAddition:\n",
" print('FIXED')\n",
" for (t, c) in sorted(mapAddition.items(), key=unkey):\n",
" print(f'\\t{str(t):<15} => {c}')\n",
"else:\n",
" print('NOTHING FIXED')\n",
" \n",
"if notFixed:\n",
" print('UNFIXED')\n",
" for t in sorted(notFixed, key=unkey):\n",
" print(f'\\t{str(t):<15} => ?')\n",
"else:\n",
" print('ALL FIXED')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Solutions\n",
"\n",
"Most of the remaining problems above got solved by a \n",
"[table provided by Martijn Kokken](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/MartijnKokken.txt)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"m => ?\n",
"n => ?\n",
"p => ?\n",
"sz => ?\n",
"sze9 => ?\n",
"szunigin => ?"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ah': '๐ด',\n",
" 'AH': '๐ด',\n",
" 'alamusz': '๐ญ',\n",
" 'babila2': '๐๐ญ๐',\n",
" 'dah': '๐ญ',\n",
" 'eh': '๐ด',\n",
" 'EH': '๐ด',\n",
" 'eri11': '๐',\n",
" 'geszimmar': '๐ท',\n",
" 'gudu4': '๐ด๐จ',\n",
" 'had2': '๐',\n",
" 'har': '๐ฏ',\n",
" 'HAR': '๐ฏ',\n",
" 'he': '๐ญ',\n",
" 'he2': '๐ถ',\n",
" 'hun': '๐ ',\n",
" 'hur': '๐ฏ',\n",
" 'huz': '๐',\n",
" 'ih': '๐ด',\n",
" 'IH': '๐ด',\n",
" 'itu': '๐',\n",
" 'KA': '๐
๐ซ',\n",
" 'kislah': '๐ ๐',\n",
" 'lah': '๐',\n",
" 'lah4': '๐ป',\n",
" 'lah5': '๐บ๐บ',\n",
" 'lah6': '๐บ',\n",
" 'lal3': '๐ญ',\n",
" 'muhaldim': '๐ฌ',\n",
" 'nigar': '๐๐๐ค',\n",
" 'nirah': '๐ฒ',\n",
" 'sa10': '๐',\n",
" 'sahar': '๐
',\n",
" 'siskur2': '๐ฌ๐ฌ',\n",
" 'szagina': '๐๐ด',\n",
" 'szah': '๐',\n",
" 'szah2': '๐',\n",
" 'szandana': '๐ฒ๐',\n",
" 'tah': '๐ญ',\n",
" 'tap': '๐ฐ',\n",
" 'udru': '๐พ',\n",
" 'UH': '๐ด',\n",
" 'uh': '๐ด',\n",
" 'UH2': '๐๐ต',\n",
" 'uh2': '๐๐ต',\n",
" 'uh3': '๐ต',\n",
" 'UH3': '๐ต',\n",
" 'ukken': '๐บ',\n",
" 'unu': '๐',\n",
" 'barig': '๐น',\n",
" '1(barig)': '๐น',\n",
" '2(barig)': '๐น๐น',\n",
" '3(barig)': '๐น๐น๐น',\n",
" '4(barig)': '๐',\n",
" '5(barig)': '๐ฅ',\n",
" 'bur3': '๐',\n",
" \"bur'u\": '๐ด',\n",
" '1(bur3)': '๐',\n",
" '2(bur3)': '๐๐',\n",
" '3(bur3)': '๐๐๐',\n",
" '4(bur3)': '๐',\n",
" '5(bur3)': '๐',\n",
" '6(bur3)': '๐',\n",
" '7(bur3)': '๐',\n",
" '8(bur3)': '๐',\n",
" '9(bur3)': '๐',\n",
" '1/2(disz)': '๐ฆ',\n",
" '13(disz)': '๐๐',\n",
" '1(iku)': '๐ธ',\n",
" '2(iku)': '๐',\n",
" '3(iku)': '๐',\n",
" '4(iku)': '๐',\n",
" '5(iku)': '๐',\n",
" '6(iku)': '๐',\n",
" '7(iku)': '๐
',\n",
" '8(iku)': '๐',\n",
" '9(iku)': '๐',\n",
" '3(esze3)': '๐ธ๐',\n",
" 'gesz2': '๐',\n",
" \"gesz'u\": '๐',\n",
" 'szar2': '๐น'}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"MAPPING_SOLUTIONS = dict(\n",
" ah=('HIxNUN', 'U12134'),\n",
" AH=('HIxNUN', 'U12134'),\n",
" alamusz=('TAxHI', 'U122ED'),\n",
" babila2=('KA2.AN.RA', 'U1218D U1202D U1228F'),\n",
" dah=('MU/MU', 'U1222D'),\n",
" eh=('HIxNUN', 'U12134'),\n",
" EH=('HIxNUN', 'U12134'),\n",
" eri11=('AB gunรป', 'U12015'),\n",
" geszimmar=('ล A6', 'U122B7'),\n",
" gudu4=('HIxNUN.ME', 'U12134 U12228'),\n",
" had2=('UD', 'U12313'),\n",
" har=('HIxAล 2', 'U1212F'),\n",
" HAR=('HIxAล 2', 'U1212F'),\n",
" he=('HI', 'U1212D'),\n",
" he2=('GAN', 'U120F6'),\n",
" hun=('Eล 2', 'U120A0'),\n",
" hur=('HIxAล 2', 'U1212F'),\n",
" huz=('LUM', 'U1221D'),\n",
" ih=('HIxNUN', 'U12134'),\n",
" IH=('HIxNUN', 'U12134'),\n",
" itu=('UDxU.U.U', 'U12317'),\n",
" KA=('KA TA', 'U12157 U122EB'),\n",
" kislah=('KI.UD', 'U121A0 U12313'),\n",
" lah=('UD', 'U12313'),\n",
" lah4=('DU / DU', 'U1207B'),\n",
" lah5=('DU.DU', 'U1207A U1207A'),\n",
" lah6=('DU', 'U1207A'),\n",
" lal3=('TAxHI', 'U122ED'),\n",
" muhaldim=('MU', 'U1222C'),\n",
" nigar=('U.UD.KID', 'U1230B U12313 U121A4'),\n",
" nirah=('MUล ', 'U12232'),\n",
" sa10=('NINDA2xล E', 'U1225A'),\n",
" sahar=('Iล ', 'U12156'),\n",
" siskur2=('AMARxล E.AMARxล E', 'U1202C U1202C'),\n",
" szagina=('GIR3.ARAD', 'U1210A U12034'),\n",
" szah=('ล UBUR', 'U122DA'),\n",
" szah2=('DUN', 'U12084'),\n",
" szandana=('GAL.NI', 'U120F2 U1224C'),\n",
" tah=('MU/MU', 'U1222D'),\n",
" tap=('TAB', 'U122F0'),\n",
" udru=('Aล 2', 'U1203E'),\n",
" UH=('HIxNUN', 'U12134'),\n",
" uh=('HIxNUN', 'U12134'),\n",
" UH2=('UD.KUล U2', 'U12313 U121B5'),\n",
" uh2=('UD.KUล U2', 'U12313 U121B5'),\n",
" uh3=('KUล U2', 'U121B5'),\n",
" UH3=('KUล U2', 'U121B5'),\n",
" ukken=('URUxBAR', 'U1233A'),\n",
" unu=('AB gunรป', 'U12015'),\n",
")\n",
"MAPPING_SOLUTIONS.update({\n",
" 'barig': ('', 'U12079'),\n",
" '1(barig)': ('', 'U12079'),\n",
" '2(barig)': ('', 'U12079 U12079'),\n",
" '3(barig)': ('', 'U12079 U12079 U12079'),\n",
" '4(barig)': ('', 'U1235D'),\n",
" '5(barig)': ('', 'U12125'),\n",
" 'bur3': ('', 'U1230B'),\n",
" \"bur'u\": ('', 'U12434'),\n",
" '1(bur3)': ('', 'U1230B'),\n",
" '2(bur3)': ('', 'U1230B U1230B'),\n",
" '3(bur3)': ('', 'U1230B U1230B U1230B'),\n",
" '4(bur3)': ('', 'U1240F'),\n",
" '5(bur3)': ('', 'U12410'),\n",
" '6(bur3)': ('', 'U12411'),\n",
" '7(bur3)': ('', 'U12412'),\n",
" '8(bur3)': ('', 'U12413'),\n",
" '9(bur3)': ('', 'U12414'),\n",
" '1/2(disz)': ('', 'U12226'),\n",
" '13(disz)': ('', 'U12399 U12408'),\n",
" '1(iku)': ('', 'U12038'),\n",
" '2(iku)': ('', 'U12400'),\n",
" '3(iku)': ('', 'U12401'),\n",
" '4(iku)': ('', 'U12402'),\n",
" '5(iku)': ('', 'U12403'),\n",
" '6(iku)': ('', 'U12404'),\n",
" '7(iku)': ('', 'U12405'),\n",
" '8(iku)': ('', 'U12406'),\n",
" '9(iku)': ('', 'U12407'),\n",
" '3(esze3)': ('', 'U12038 U1230B'),\n",
" 'gesz2': ('', 'U12415'),\n",
" \"gesz'u\": ('', 'U1241E'),\n",
" 'szar2': ('', 'U122B9'),\n",
"})\n",
"\n",
"MAPPING_SOLUTIONSX = {}\n",
"\n",
"for (token, (grapheme, uniChars)) in MAPPING_SOLUTIONS.items():\n",
" uniStr = ''.join(chr(int(uc[1:], 16)) for uc in uniChars.split())\n",
" MAPPING_SOLUTIONSX[token] = uniStr\n",
"MAPPING_SOLUTIONSX"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ambiguously mapped readings"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 50 ambiguously mapped readings\n",
"IA => (2) => ๐
- ๐ฟ\n",
"IL => (2) => ๐ง - ๐
\n",
"IRI => (2) => ๐
- ๐ท\n",
"KAM => (2) => ๐ญ๐ - ๐ฐ\n",
"LUM => (2) => ๐ - ๐\n",
"USZ => (2) => ๐ - ๐\n",
"UZ => (2) => ๐ป - ๐\n",
"WA => (2) => ๐ - ๐ฟ\n",
"ba4 => (3) => ๐๐ญ๐ท - ๐ท - ๐๐ท๐ท\n",
"ba6 => (2) => ๐๐ - ๐\n",
"bara2 => (2) => ๐ - ๐\n",
"bum => (2) => ๐
ค - ๐\n",
"buru14 => (2) => ๐ - ๐\n",
"dabin => (2) => ๐ ๐บ - ๐ฅ๐บ\n",
"dilmun => (3) => ๐๐ - ๐ฉ๐ธ - ๐ฉ๐\n",
"eri => (2) => ๐
- ๐ท\n",
"erisz => (2) => ๐ฉ๐ - ๐ฉ๐\n",
"gala => (3) => ๐ฒ - ๐๐ช - ๐\n",
"gin7 => (2) => ๐ถ - ๐\n",
"gurusz => (2) => ๐จ - ๐\n",
"ia => (2) => ๐
- ๐ฟ\n",
"idim => (2) => ๐ - ๐
\n",
"ii => (2) => ๐
- ๐ฟ\n",
"il => (2) => ๐ง - ๐
\n",
"iri => (2) => ๐
- ๐ท\n",
"isz8 => (2) => ๐น - ๐\n",
"iu => (2) => ๐
- ๐ฟ\n",
"kam => (2) => ๐ญ๐ - ๐ฐ\n",
"kesz2 => (2) => ๐ก - ๐\n",
"kesz3 => (2) => ๐๐ญ๐ฒ - ๐๐ญ๐ฒ๐ \n",
"lum => (2) => ๐ - ๐\n",
"munu4 => (2) => ๐ฝ๐ฝ - ๐ฝ๐บ๐ฝ\n",
"ne3 => (2) => ๐ - ๐\n",
"nergal => (2) => ๐๐๐ฒ - ๐๐๐ฒ\n",
"pa2 => (2) => ๐ - ๐๐\n",
"pirig => (2) => ๐ - ๐\n",
"puzur4 => (2) => ๐
ค๐ญ - ๐๐ญ\n",
"sig17 => (2) => ๐ - ๐ฌ\n",
"sze20 => (2) => ๐ - ๐
\n",
"t,a2 => (2) => ๐ซ - ๐ฌ\n",
"til => (2) => ๐ - ๐\n",
"us => (2) => ๐ป - ๐\n",
"usa => (2) => ๐ - ๐\n",
"usz => (2) => ๐ - ๐\n",
"usz2 => (2) => ๐ - ๐\n",
"uz => (2) => ๐ป - ๐\n",
"wa => (2) => ๐ - ๐ฟ\n",
"wa2 => (2) => ๐ - ๐\n",
"zi3 => (2) => ๐ - ๐ฅ\n",
"ziz2 => (2) => ๐พ - ๐ฉ\n"
]
}
],
"source": [
"print(f'{len(multiple):>3} ambiguously mapped readings')\n",
"for r in sorted(multiple):\n",
" unis = multiple[r]\n",
" uniStr = ' - '.join(sorted(unis))\n",
" print(f'{r} => ({len(unis)}) => {uniStr}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Uniquely mapped readings"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"768 uniquely mapped readings\n",
" A => ๐\n",
" AB => ๐\n",
" AD => ๐\n",
" AG => ๐\n",
" AK => ๐\n",
" AL => ๐ \n",
" AM => ๐ \n",
" AN => ๐ญ\n",
" AR => ๐
\n",
" ARAD => ๐ด\n",
" ARAD2 => ๐ต\n",
" AS, => ๐\n",
" AS2 => ๐พ\n",
" ASZ => ๐ธ\n",
" AZ => ๐\n",
" BA => ๐\n",
" BAD => ๐\n",
" BAR => ๐\n",
" BE => ๐\n",
" BI => ๐\n",
" BU => ๐\n",
" BUR => ๐\n",
" DA => ๐\n",
" DAM => ๐ฎ\n",
" DI => ๐ฒ\n",
" DIM => ๐ด\n",
" DIN => ๐ท\n",
" DISZ => ๐น\n",
" DU => ๐บ\n",
" E => ๐\n",
" EDIN => ๐\n",
" EK => ๐
\n",
" EL => ๐\n",
" ER => ๐
\n",
" GA => ๐ต\n",
" GAG => ๐\n",
" GAL => ๐ฒ\n",
" GAN2 => ๐ท\n",
" GAR => ๐ป\n",
" GAZ => ๐ค\n",
" GESZ => ๐\n",
" GI => ๐\n",
" GIR => ๐ซ\n",
" GIR2 => ๐\n",
" GU => ๐\n",
" I => ๐ฟ\n",
" IB => ๐
\n",
" ID => ๐\n",
" IG => ๐
\n",
" IK => ๐
\n",
" IL2 => ๐
\n",
" IM => ๐
\n",
" IN => ๐
\n",
" IR => ๐
\n",
" ISZ => ๐
\n",
" IZ => ๐\n",
" KA => ๐
\n",
" KAB => ๐\n",
" KI => ๐ \n",
" KIB => ๐\n",
" KU => ๐ช\n",
" KUM => ๐ฃ\n",
" KUR => ๐ณ\n",
" LA => ๐ท\n",
" LAM => ๐ด\n",
" LE => ๐ท\n",
" LI => ๐ท\n",
" LU => ๐ป\n",
" LU2 => ๐ฝ\n",
" MA => ๐ \n",
" ME => ๐จ\n",
" MI => ๐ช\n",
" NA => ๐พ\n",
" NAM => ๐\n",
" NE => ๐\n",
" NI => ๐\n",
" NIG2 => ๐ป\n",
" NIM => ๐\n",
" NIN => ๐ฉ๐\n",
" NU => ๐ก\n",
" NUN => ๐ฃ\n",
" PA => ๐บ\n",
" PI => ๐ฟ\n",
" RA => ๐\n",
" RI => ๐\n",
" RU => ๐\n",
" S,I => ๐ข\n",
" SA => ๐\n",
" SAG => ๐\n",
" SAR => ๐ฌ\n",
" SIG => ๐\n",
" SU => ๐ข\n",
" SZA => ๐ญ\n",
" SZE => ๐บ\n",
" SZE3 => ๐ \n",
" SZESZ => ๐\n",
" SZI => ๐
\n",
" SZIM => ๐\n",
" SZIR => ๐\n",
" SZU => ๐\n",
" TA => ๐ซ\n",
" TAB => ๐\n",
" TAM => ๐\n",
" TAR => ๐ป\n",
" TE => ๐ผ\n",
" TI => ๐พ\n",
" TIM => ๐ด\n",
" TU => ๐
\n",
" TUG2 => ๐\n",
" TUL2 => ๐ฅ\n",
" TUM => ๐\n",
" TUR => ๐\n",
" U2 => ๐\n",
" U3 => ๐
\n",
" U4 => ๐\n",
" UB => ๐\n",
" UD => ๐\n",
" UG => ๐\n",
" UK => ๐\n",
" UL => ๐\n",
" UM => ๐\n",
" UR => ๐จ\n",
" WE => ๐ฟ\n",
" WI => ๐ฟ\n",
" ZA => ๐\n",
" ZE => ๐ฃ\n",
" ZI => ๐ฃ\n",
" ZI2 => ๐ข\n",
" ZU => ๐ช\n",
" ZUM => ๐ฎ\n",
" a => ๐\n",
" a2 => ๐\n",
" ab => ๐\n",
" ab2 => ๐\n",
" abul => ๐๐ฒ\n",
" abzu => ๐ช๐\n",
" ad => ๐\n",
" adab => ๐๐ฃ\n",
" ag => ๐\n",
" ag2 => ๐\n",
" aga => ๐\n",
" agrig => ๐
๐พ\n",
" ak => ๐\n",
" akszak => ๐\n",
" al => ๐ \n",
" alam => ๐ฉ\n",
" alan => ๐ฉ\n",
" am => ๐ \n",
" am3 => ๐๐ญ\n",
" ama => ๐ผ\n",
" amar => ๐ซ\n",
" an => ๐ญ\n",
" ansze => ๐\n",
" ap => ๐\n",
" apin => ๐ณ\n",
" aq => ๐\n",
" ar => ๐
\n",
" ar3 => ๐ฏ\n",
" as => ๐\n",
" as, => ๐\n",
" as2 => ๐พ\n",
" asal => ๐\n",
" asar => ๐\n",
" asz => ๐ธ\n",
" asz2 => ๐พ\n",
" asza5 => ๐ท\n",
" aszgab => ๐ฟ\n",
" asznan => ๐บ๐\n",
" at => ๐\n",
" at, => ๐\n",
" az => ๐\n",
" az2 => ๐พ\n",
" az3 => ๐ธ\n",
" azlag2 => ๐\n",
" ba => ๐\n",
" babbar => ๐\n",
" bad3 => ๐ฆ\n",
" bal => ๐\n",
" bala => ๐\n",
" ban => ๐ผ\n",
" ban2 => ๐\n",
" ban3 => ๐\n",
" banda3 => ๐\n",
" banesz => ๐\n",
" banszur => ๐\n",
" bappir => ๐\n",
" bar => ๐\n",
" bat => ๐\n",
" be => ๐\n",
" be2 => ๐\n",
" bi => ๐\n",
" bi2 => ๐\n",
" bil => ๐\n",
" bil2 => ๐\n",
" bir2 => ๐\n",
" bir4 => ๐\n",
" bisz => ๐ซ\n",
" bu => ๐\n",
" bun2 => ๐
ฎ\n",
" bur => ๐\n",
" bur3 => ๐\n",
" buranun => ๐๐๐ฃ\n",
" d => ๐ญ\n",
" da => ๐\n",
" dab => ๐ณ\n",
" dab5 => ๐ช\n",
" dag => ๐\n",
" dagal => ๐ผ\n",
" dam => ๐ฎ\n",
" dan => ๐จ\n",
" daq => ๐\n",
" dar => ๐ฏ\n",
" de => ๐ฒ\n",
" de3 => ๐\n",
" de4 => ๐ผ\n",
" di => ๐ฒ\n",
" di2 => ๐น\n",
" di3 => ๐พ\n",
" dib => ๐ณ\n",
" dida => ๐๐๐\n",
" didli => ๐\n",
" dil => ๐ธ\n",
" dim => ๐ด\n",
" dim2 => ๐ถ\n",
" dim4 => ๐ฝ๐ฝ\n",
" din => ๐ท\n",
" dingir => ๐ญ\n",
" diri => ๐๐\n",
" dirig => ๐๐\n",
" disz => ๐น\n",
" du => ๐บ\n",
" du10 => ๐ญ\n",
" du11 => ๐
\n",
" du3 => ๐\n",
" du5 => ๐
\n",
" du6 => ๐ฏ\n",
" du7 => ๐\n",
" du8 => ๐\n",
" dub => ๐พ\n",
" dug => ๐\n",
" dug3 => ๐ญ\n",
" dul3 => ๐จ\n",
" dul5 => ๐\n",
" dumu => ๐\n",
" duru5 => ๐\n",
" dusu => ๐
\n",
" dusu2 => ๐ฒ๐
\n",
" e => ๐\n",
" e2 => ๐\n",
" e3 => ๐๐บ\n",
" ea => ๐\n",
" eb => ๐
\n",
" ed => ๐\n",
" edin => ๐\n",
" eg => ๐
\n",
" egir => ๐\n",
" ek => ๐
\n",
" el => ๐\n",
" el2 => ๐
\n",
" el3 => ๐ญ\n",
" elam => ๐\n",
" em => ๐
\n",
" eme => ๐
ด\n",
" eme6 => ๐ฒ๐ฉ\n",
" en => ๐\n",
" en6 => ๐
\n",
" engar => ๐ณ\n",
" enku => ๐ ๐ฉ\n",
" ensi2 => ๐๐ผ๐\n",
" ep => ๐
\n",
" eq => ๐
\n",
" er => ๐
\n",
" er2 => ๐๐
\n",
" er3 => ๐ด\n",
" eren2 => ๐\n",
" eresz2 => ๐\n",
" erim => ๐\n",
" erin => ๐\n",
" erin2 => ๐\n",
" es => ๐\n",
" es, => ๐\n",
" esir => ๐๐\n",
" esz => ๐\n",
" esz15 => ๐
\n",
" esz18 => ๐น\n",
" esz2 => ๐ \n",
" esz3 => ๐\n",
" esza => ๐๐\n",
" esze3 => ๐\n",
" et => ๐\n",
" et, => ๐\n",
" ez => ๐\n",
" ezem => ๐ก\n",
" ga => ๐ต\n",
" ga2 => ๐ท\n",
" gab => ๐ฎ\n",
" gaba => ๐ฎ\n",
" gada => ๐ฐ\n",
" gag => ๐\n",
" gal => ๐ฒ\n",
" gal2 => ๐
\n",
" gan => ๐ถ\n",
" gan2 => ๐ท\n",
" ganba => ๐ ๐ด\n",
" gar => ๐ป\n",
" gar3 => ๐ผ\n",
" gaz => ๐ค\n",
" ge => ๐\n",
" ge6 => ๐ช\n",
" geme => ๐ฉ\n",
" geme2 => ๐ฉ๐ณ\n",
" gesz => ๐\n",
" gesztin => ๐พ\n",
" gesztu2 => ๐๐๐ฟ\n",
" gi => ๐\n",
" gi2 => ๐ค\n",
" gi4 => ๐\n",
" gi7 => ๐ \n",
" gibil => ๐\n",
" gid2 => ๐\n",
" gidri => ๐บ\n",
" gigir => ๐\n",
" gim => ๐ถ\n",
" gin => ๐บ\n",
" gin2 => ๐
\n",
" gir => ๐ซ\n",
" gir14 => ๐ฉ\n",
" gir2 => ๐\n",
" gir3 => ๐\n",
" gir8 => ๐ธ\n",
" giri17 => ๐
\n",
" giri3 => ๐\n",
" gissu => ๐๐ช\n",
" gisz => ๐\n",
" gu => ๐\n",
" gu2 => ๐\n",
" gu4 => ๐\n",
" gu7 => ๐
ฅ\n",
" gub => ๐บ\n",
" gud => ๐\n",
" gul => ๐ข\n",
" gum2 => ๐\n",
" gur => ๐ฅ\n",
" gur10 => ๐ฅ\n",
" gur11 => ๐ต\n",
" gur8 => ๐ฝ\n",
" guru7 => ๐ฆ\n",
" i => ๐ฟ\n",
" i3 => ๐\n",
" i7 => ๐๐\n",
" ia2 => ๐\n",
" ia3 => ๐\n",
" ib => ๐
\n",
" ib2 => ๐\n",
" ibila => ๐๐\n",
" id => ๐\n",
" id2 => ๐๐\n",
" idigna => ๐ฆ๐๐ผ\n",
" ig => ๐
\n",
" igi => ๐
\n",
" ik => ๐
\n",
" iku => ๐ท\n",
" il2 => ๐
\n",
" il3 => ๐ญ\n",
" il5 => ๐\n",
" illat => ๐๐ณ\n",
" im => ๐
\n",
" imin => ๐\n",
" imma3 => ๐
\n",
" in => ๐
\n",
" ina => ๐ธ\n",
" inanna => ๐น\n",
" inim => ๐
\n",
" ip => ๐
\n",
" iq => ๐
\n",
" ir => ๐
\n",
" ir3 => ๐ด\n",
" is => ๐\n",
" is, => ๐\n",
" is2 => ๐
\n",
" is3 => ๐\n",
" is4 => ๐พ\n",
" isz => ๐
\n",
" isz3 => ๐\n",
" isz7 => ๐\n",
" iszkur => ๐
\n",
" isztaran => ๐
๐ฒ\n",
" it => ๐\n",
" it, => ๐\n",
" iti => ๐\n",
" iz => ๐\n",
" ka => ๐
\n",
" ka2 => ๐\n",
" ka3 => ๐ต\n",
" ka9 => ๐\n",
" kab => ๐\n",
" kak => ๐\n",
" kal => ๐จ\n",
" kal2 => ๐ฒ\n",
" kalag => ๐จ\n",
" kalam => ๐ฆ\n",
" kap => ๐\n",
" kar => ๐ผ๐\n",
" kar2 => ๐ธ\n",
" kar3 => ๐ผ\n",
" kas4 => ๐ฝ\n",
" kaskal => ๐\n",
" kasz => ๐\n",
" ke => ๐ \n",
" ke4 => ๐ค\n",
" ki => ๐ \n",
" ki2 => ๐\n",
" kid => ๐ค\n",
" kikken2 => ๐ฏ๐ฏ\n",
" kilib => ๐ธ\n",
" kin => ๐ฅ\n",
" kir => ๐ซ\n",
" kiri6 => ๐ฌ\n",
" kisz => ๐ง\n",
" kiszib => ๐ฉ\n",
" kiszib3 => ๐พ\n",
" ku => ๐ช\n",
" ku13 => ๐ฃ\n",
" ku3 => ๐ฌ\n",
" ku4 => ๐ฎ\n",
" ku5 => ๐ป\n",
" ku6 => ๐ฉ\n",
" kul => ๐ฐ\n",
" kum => ๐ฃ\n",
" kun => ๐ฒ\n",
" kup4 => ๐ค\n",
" kur => ๐ณ\n",
" kur2 => ๐ฝ\n",
" kurun2 => ๐ท\n",
" kuruszda => ๐ฏ\n",
" kusz => ๐ข\n",
" kusz3 => ๐\n",
" la => ๐ท\n",
" la2 => ๐ฒ\n",
" lagasz => ๐๐๐ท\n",
" lam => ๐ด\n",
" lamma => ๐จ\n",
" larsa => ๐๐\n",
" le => ๐ท\n",
" lem => ๐
\n",
" li => ๐ท\n",
" li2 => ๐\n",
" li3 => ๐
\n",
" libir => ๐
๐ \n",
" lik => ๐จ\n",
" lil2 => ๐ค\n",
" lim => ๐
\n",
" lu => ๐ป\n",
" lu2 => ๐ฝ\n",
" lu4 => ๐\n",
" lugal => ๐\n",
" lukur => ๐ฉ๐จ\n",
" ma => ๐ \n",
" ma2 => ๐ฃ\n",
" mal => ๐ท\n",
" man => ๐๐\n",
" mar => ๐ฅ\n",
" mar2 => ๐ซ\n",
" marduk => ๐ซ๐\n",
" masz => ๐ฆ\n",
" masz2 => ๐ง\n",
" maszkim => ๐๐ฝ\n",
" me => ๐จ\n",
" me2 => ๐ช\n",
" mesz => ๐จ๐\n",
" mi => ๐ช\n",
" mi2 => ๐ฉ\n",
" mi3 => ๐จ\n",
" mil => ๐
\n",
" mu => ๐ฌ\n",
" mug => ๐ฎ\n",
" mun => ๐ต\n",
" munus => ๐ฉ\n",
" mur => ๐ฏ\n",
" musz => ๐ฒ\n",
" musz5 => ๐\n",
" muszen => ๐ท\n",
" na => ๐พ\n",
" na4 => ๐๐\n",
" nag2 => ๐\n",
" nagar => ๐\n",
" nagga => ๐ญ๐พ\n",
" nam => ๐\n",
" nanna => ๐ถ๐ \n",
" nansze => ๐\n",
" nar => ๐\n",
" ne => ๐\n",
" ne2 => ๐\n",
" ni => ๐\n",
" nibru => ๐๐ค\n",
" nidba2 => ๐ป๐น\n",
" nig2 => ๐ป\n",
" nigin3 => ๐๐๐ค\n",
" nigin6 => ๐\n",
" nim => ๐\n",
" nimgir => ๐\n",
" nin => ๐ฉ๐\n",
" nina => ๐\n",
" ninda => ๐ป\n",
" nir => ๐ช\n",
" nita => ๐\n",
" nita2 => ๐ด\n",
" nu => ๐ก\n",
" nu2 => ๐ฟ\n",
" num => ๐\n",
" numun => ๐ฐ\n",
" nun => ๐ฃ\n",
" nunuz => ๐ญ\n",
" pa => ๐บ\n",
" pa12 => ๐ฟ\n",
" pa3 => ๐
๐\n",
" pa4 => ๐ฝ\n",
" pa5 => ๐ฝ๐\n",
" pal => ๐\n",
" par2 => ๐\n",
" pe => ๐ฟ\n",
" pe2 => ๐\n",
" pesz => ๐ซ\n",
" pi => ๐ฟ\n",
" pi2 => ๐\n",
" pi4 => ๐
\n",
" pil => ๐\n",
" pil2 => ๐\n",
" pir => ๐\n",
" pisan => ๐ท\n",
" pisz => ๐ซ\n",
" pu => ๐\n",
" pur => ๐\n",
" puzur => ๐\n",
" qa => ๐ก\n",
" qa2 => ๐ต\n",
" qa3 => ๐
\n",
" qal4 => ๐จ\n",
" qar => ๐ผ\n",
" qar3 => ๐ป\n",
" qe => ๐ฅ\n",
" qe2 => ๐ \n",
" qe3 => ๐\n",
" qi => ๐ฅ\n",
" qi2 => ๐ \n",
" qi3 => ๐\n",
" qi4 => ๐\n",
" qir => ๐ซ\n",
" qu => ๐ฃ\n",
" qu2 => ๐ช\n",
" qu3 => ๐\n",
" qum => ๐ฃ\n",
" qur2 => ๐ณ\n",
" ra => ๐\n",
" ra2 => ๐บ\n",
" rasz => ๐\n",
" re => ๐\n",
" ri => ๐\n",
" ri2 => ๐ท\n",
" rim5 => ๐ธ\n",
" ru => ๐\n",
" ru3 => ๐ธ\n",
" rum => ๐ธ\n",
" s,a => ๐\n",
" s,ar => ๐ก\n",
" s,e => ๐ข\n",
" s,e2 => ๐ฃ\n",
" s,i => ๐ข\n",
" s,i2 => ๐ฃ\n",
" s,il2 => ๐ช\n",
" s,ir => ๐ฒ\n",
" s,u => ๐ฎ\n",
" s,u2 => ๐ช\n",
" s,um => ๐ฎ\n",
" s,ur => ๐ซ\n",
" sa => ๐\n",
" sa12 => ๐\n",
" sa2 => ๐ฒ\n",
" sa3 => ๐\n",
" sa6 => ๐ท\n",
" sag => ๐\n",
" sag11 => ๐ฅ\n",
" saga => ๐\n",
" sak => ๐\n",
" sal => ๐ฉ\n",
" sanga => ๐\n",
" sar => ๐ฌ\n",
" se => ๐\n",
" se2 => ๐ฃ\n",
" se3 => ๐ง\n",
" si => ๐\n",
" si2 => ๐ฃ\n",
" si3 => ๐ง\n",
" si4 => ๐\n",
" sig => ๐\n",
" sig2 => ๐ \n",
" sig4 => ๐\n",
" siki => ๐ \n",
" sikil => ๐\n",
" sila => ๐ป\n",
" sila3 => ๐ก\n",
" sila4 => ๐ข\n",
" silig => ๐\n",
" silim => ๐ฒ\n",
" sim => ๐\n",
" simug => ๐ฃ\n",
" sin => ๐\n",
" sin2 => ๐\n",
" sipa => ๐บ๐ป\n",
" sipad => ๐บ๐ป\n",
" sir2 => ๐\n",
" sir3 => ๐ก\n",
" su => ๐ข\n",
" su2 => ๐ช\n",
" su3 => ๐ค\n",
" su7 => ๐ญ\n",
" suen => ๐๐ช\n",
" sukkal => ๐\n",
" sum => ๐ง\n",
" sum2 => ๐ฎ\n",
" sun2 => ๐ข\n",
" sur => ๐ฉ\n",
" sza => ๐ญ\n",
" sza13 => ๐น\n",
" sza3 => ๐ฎ\n",
" szabra => ๐๐ \n",
" szakkan2 => ๐\n",
" szam => ๐\n",
" szam3 => ๐\n",
" szar => ๐ฌ\n",
" szara2 => ๐\n",
" sze => ๐บ\n",
" sze3 => ๐ \n",
" szen => ๐ฟ\n",
" szennur => ๐\n",
" szesz => ๐\n",
" szi => ๐
\n",
" szi2 => ๐\n",
" szim => ๐\n",
" szinig => ๐\n",
" szitim => ๐ถ\n",
" szu => ๐\n",
" szub => ๐\n",
" szubur => ๐\n",
" szuku => ๐ป\n",
" szul => ๐\n",
" szum => ๐ณ\n",
" szum2 => ๐ง\n",
" szur => ๐ฉ\n",
" szur4 => ๐ณ๐ฌ\n",
" szusz3 => ๐
\n",
" t,a => ๐\n",
" t,a3 => ๐ญ\n",
" t,am => ๐ฎ\n",
" t,ar => ๐ป\n",
" t,e => ๐ฒ\n",
" t,e4 => ๐ผ\n",
" t,e6 => ๐พ\n",
" t,i => ๐ฒ\n",
" t,i3 => ๐พ\n",
" t,u => ๐
\n",
" t,u2 => ๐
\n",
" t,u3 => ๐บ\n",
" t,ul => ๐ฅ\n",
" t,um => ๐\n",
" t,up => ๐พ\n",
" ta => ๐ซ\n",
" ta2 => ๐\n",
" tab => ๐\n",
" tak => ๐ณ\n",
" tak2 => ๐\n",
" tak4 => ๐บ\n",
" taka4 => ๐บ\n",
" tal => ๐\n",
" tam => ๐\n",
" tam2 => ๐ฎ\n",
" tar => ๐ป\n",
" tar2 => ๐ฏ\n",
" taskarin => ๐\n",
" tasz => ๐จ\n",
" te => ๐ผ\n",
" te4 => ๐\n",
" te9 => ๐พ\n",
" tel => ๐\n",
" ter => ๐\n",
" ti => ๐พ\n",
" ti7 => ๐ผ\n",
" tibira => ๐๐\n",
" tim => ๐ด\n",
" tir => ๐\n",
" tiszpak => ๐ฝ\n",
" tu => ๐
\n",
" tu2 => ๐\n",
" tu3 => ๐บ\n",
" tug2 => ๐\n",
" tukul => ๐ช\n",
" tul => ๐๐\n",
" tul2 => ๐ฅ\n",
" tum => ๐\n",
" tun3 => ๐
\n",
" tup => ๐พ\n",
" tur => ๐\n",
" tur2 => ๐\n",
" u => ๐\n",
" u2 => ๐\n",
" u3 => ๐
\n",
" u4 => ๐\n",
" u8 => ๐\n",
" ub => ๐\n",
" ud => ๐\n",
" ud5 => ๐\n",
" udu => ๐ป\n",
" ug => ๐\n",
" ug3 => ๐ฆ\n",
" ugula => ๐บ\n",
" uk => ๐\n",
" ul => ๐\n",
" um => ๐\n",
" umbin => ๐ข\n",
" umma => ๐๐ต\n",
" un => ๐ฆ\n",
" unken => ๐บ\n",
" unug => ๐\n",
" up => ๐\n",
" uq => ๐\n",
" ur => ๐จ\n",
" ur2 => ๐ซ\n",
" ur3 => ๐ก\n",
" ur5 => ๐ฏ\n",
" urasz => ๐
\n",
" uri2 => ๐๐\n",
" urta => ๐
\n",
" uru => ๐ท\n",
" uru4 => ๐ณ\n",
" uruda => ๐\n",
" urudu => ๐\n",
" us, => ๐ป\n",
" us,2 => ๐\n",
" us,4 => ๐\n",
" us2 => ๐\n",
" usan3 => ๐ฎ\n",
" ut => ๐\n",
" ut, => ๐\n",
" utu => ๐\n",
" uz2 => ๐\n",
" uzu => ๐\n",
" we => ๐ฟ\n",
" wi => ๐ฟ\n",
" wu => ๐ฟ\n",
" yi => ๐ฟ\n",
" za => ๐\n",
" za3 => ๐ \n",
" zabala4 => ๐น๐๐\n",
" zabar => ๐๐
๐ฆ\n",
" zadim => ๐ฏ\n",
" zal => ๐\n",
" zalag2 => ๐\n",
" zar => ๐ก\n",
" ze => ๐ฃ\n",
" ze2 => ๐ข\n",
" zi => ๐ฃ\n",
" zi2 => ๐ข\n",
" zid2 => ๐ \n",
" zimbir => ๐๐๐ฃ\n",
" zir3 => ๐ฒ\n",
" zu => ๐ช\n",
" zu2 => ๐
\n",
" zum => ๐ฎ\n"
]
}
],
"source": [
"print(f'{len(unique):>3} uniquely mapped readings')\n",
"for r in sorted(unique):\n",
" print(f'{r:>10} => {unique[r]}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write the mapping file"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"964 entries written to /Users/dirk/github/Nino-cunei/oldbabylonian/characters/mapping.tsv\n"
]
}
],
"source": [
"pairs = {}\n",
"for (k, vs) in multiple.items():\n",
" pairs[k] = sorted(vs)[0]\n",
"for (t, v) in mapAddition.items():\n",
" k = f'{t[0]}({t[1]})' if type(t) is tuple else t\n",
" pairs[k] = v\n",
"for (k, v) in MAPPING_SOLUTIONSX.items():\n",
" pairs[k] = v\n",
"for (k, v) in unique.items():\n",
" pairs[k] = v\n",
"\n",
"with open(MAPPING_FILE, 'w') as mf:\n",
" for (k,v) in sorted(pairs.items()):\n",
" mf.write(f'{k}\\t{v}\\n')\n",
"print(f'{len(pairs)} entries written to {MAPPING_FILE}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
