{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mapping Old Babylonian readings to Unicode\n", "\n", "## Task\n", "\n", "We want to map *readings* and *graphemes* in cuneiform corpora to cuneiform unicode characters,\n", "based on extant mapping tables.\n", "\n", "We generate a plain mapping that can be used readily by programs that convert from ATF to TF or something else.\n", "\n", "## Problem\n", "\n", "There are multiple mapping tables, there are several ways to transliterate readings.\n", "\n", "## Sources\n", "\n", "We take the ATF transliterations from CDLI, for tablets found by a search on AbB and Old Babylonian.\n", "\n", "We take the file\n", "[GeneratedSignList.json](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/GeneratedSignList.json)\n", "with mappings like\n", "\n", "```json\n", " \"BANIA\": {\n", " \"signName\": \"BANIA\",\n", " \"signNumber\": 551,\n", " \"signCunei\": \"๐’‘”\",\n", " \"codePoint\": \"\",\n", " \"values\":\n", "\t\t\t[\n", " \"BANIA\", \"Aล 2.UoverU\", \"5SลชTU\"\n", " ]\n", " },\n", " \"MA\": {\n", " \"signName\": \"MA\",\n", " \"signNumber\": 552,\n", " \"signCunei\": \"๐’ˆ \",\n", " \"codePoint\": \"\",\n", " \"values\":\n", "\t\t\t[\n", " \"MA\", \"PEล 3\", \"PEล ล E\", \"WA6\"\n", " ]\n", " },\n", "```\n", "\n", "See [transcription](https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md)\n", "about the provenance of this file.\n", "\n", "# Status\n", "\n", "This is work in progress. \n", "The mapping is needed in the conversion from ATF to TF in the program\n", "[tfFromATF.py](tfFromATF.py).\n", "\n", "# Authors\n", "\n", "Cale Johnson, Martijn Kokken, Dirk Roorda\n", "\n", "# Acknowledgements\n", "\n", "We are indebted to **Auday Hussein** for helpfully sending *GeneratedSignList.json* file to us;\n", "to **Alba de Ridder** for hints and comments." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import collections\n", "import re\n", "import json\n", "from unicodedata import name as uname\n", "\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code\n", "Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github\n" ] }, { "data": { "text/html": [ "Documentation: OLDBABYLONIAN Character table Feature docs oldbabylonian API Text-Fabric API 7.5.1 Search Reference
Loaded features:\n", "

Old Babylonian Letters 1900-1600: Cuneiform tablets : ARK after afterr afteru atf atfpost atfpre author col collated collection comment damage det docnote docnumber excavation excised face flags fraction genre grapheme graphemer graphemeu lang langalt ln lnc lnno material missing museumcode museumname object operator operatorr operatoru otype period pnumber primecol primeln pubdate question reading readingr readingu remarkable remarks repeat srcLn srcLnNum srcfile subgenre supplied sym symr symu trans transcriber translation@ll type uncertain volume oslots

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
API members:\n", "C Computed, Call AllComputeds, Cs ComputedString
\n", "E Edge, Eall AllEdges, Es EdgeString
\n", "ensureLoaded, TF, ignored, loadLog
\n", "L Locality
\n", "cache, error, indent, info, reset
\n", "N Nodes, sortKey, sortKeyTuple, otypeRank, sortNodes
\n", "F Feature, Fall AllFeatures, Fs FeatureString
\n", "S Search
\n", "T Text
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use('oldbabylonian', hoist=globals(), lgc=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Local topography" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "BASE = os.path.expanduser('~/github')\n", "ORG = 'Nino-cunei'\n", "REPO = 'oldbabylonian'\n", "\n", "REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n", "\n", "WRITING_DIR = f'{REPO_DIR}/sources/writing'\n", "\n", "SIGN_FILE = 'GeneratedSignList.json'\n", "SIGN_PATH = f'{WRITING_DIR}/{SIGN_FILE}'\n", "\n", "MAPPING_FILE = f'{os.path.abspath(\"..\")}/characters/mapping.tsv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading collection\n", "\n", "We use TF to collect all readings from the corpus in a set." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "969 different tokens in corpus\n" ] } ], "source": [ "READABLE_TYPES = {'reading', 'grapheme', 'numeral', 'complex'}\n", "\n", "tokens = set()\n", "\n", "for s in F.otype.s('sign'):\n", " typ = F.type.v(s)\n", " if typ not in READABLE_TYPES:\n", " continue\n", " reading = F.reading.v(s)\n", " if typ == 'numeral':\n", " repeat = F.repeat.v(s)\n", " fraction = F.fraction.v(s)\n", " if repeat:\n", " if repeat > 0:\n", " tokens.add((repeat, reading))\n", " else:\n", " tokens.add(reading)\n", " else:\n", " tokens.add((fraction, reading))\n", " continue\n", " for token in (F.reading.v(s), F.grapheme.v(s)):\n", " if token:\n", " tokens.add(token)\n", "\n", "print(f'{len(tokens)} different tokens in corpus')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unicode style versus ATF style\n", "\n", "We use mappings between Unicode style transliterations and ATF." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "transAscii = {\n", " 'ลก': 'sz',\n", " 'แนฃ': 's,',\n", " 'ล›': \"s'\",\n", " 'แนญ': 't,',\n", " 'แธซ': 'h,',\n", "}\n", "\n", "transAscii.update({k.upper(): v.upper() for (k, v) in transAscii.items()})\n", "\n", "def makeAscii(r):\n", " for (rin, rout) in transAscii.items():\n", " r = r.replace(rin, rout)\n", " return r" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ลก': 'sz',\n", " 'แนฃ': 's,',\n", " 'ล›': \"s'\",\n", " 'แนญ': 't,',\n", " 'แธซ': 'h,',\n", " 'ล ': 'SZ',\n", " 'แนข': 'S,',\n", " 'ลš': \"S'\",\n", " 'แนฌ': 'T,',\n", " 'แธช': 'H,'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transAscii" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "REPEAT_INV = dict(\n", " one=1,\n", " two=2,\n", " three=3,\n", " four=4,\n", " five=5,\n", " six=6,\n", " seven=7,\n", " eight=8,\n", " nine=9,\n", ")\n", "\n", "REPEAT = {v: k for (k, v) in REPEAT_INV.items()}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "FRACTION = {\n", " '1/2': 'one half',\n", " '1/3': 'one third',\n", " '2/3': 'two thirds',\n", " '1/4': 'one quarter',\n", " '1/6': 'one sixth',\n", " '5/6': 'five sixths',\n", " '1/8': 'one eighth',\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read the sign list\n", "\n", "We read the json file with generated signs.\n", "\n", "For each sign, we find a list of *values*.\n", "\n", "These values correspond to possible readings or graphemes, in short, *tokens*. \n", "They are in unicode transliteration style.\n", "\n", "In the mapping we create, we convert them to plain ATF,\n", "which makes it easier to look them up from our Old Babylonian corpus." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1768 signs in the json file\n", "8765 distinct values in table\n" ] } ], "source": [ "with open(SIGN_PATH) as fh:\n", " signs = json.load(fh)['signs']\n", "\n", "print(f'{len(signs)} signs in the json file')\n", "\n", "mapping = collections.defaultdict(set)\n", "\n", "for (sign, signData) in signs.items():\n", " uniStr = signData['signCunei']\n", " values = signData['values']\n", " for value in values:\n", " valueAscii = makeAscii(value)\n", " mapping[valueAscii].add(uniStr)\n", "\n", "print(f'{len(mapping)} distinct values in table')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Token lookup\n", "\n", "We look up each Old Babylonian token in the mapping just constructed.\n", "\n", "Depending on whether we find 0, 1 or multiple values, we store them in dictionaries\n", "`unmapped`, `unique`, `multiple`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "151 unmapped tokens\n", " 50 ambiguously mapped tokens\n", "768 uniquely mapped tokens\n" ] } ], "source": [ "MAPPING_FIXES = {\n", " 'd': 'dingir',\n", "}\n", "\n", "unmapped = set()\n", "unique = {}\n", "multiple = {}\n", "\n", "for t in tokens:\n", " if type(t) is tuple:\n", " unmapped.add(t)\n", " continue\n", " tLookup = MAPPING_FIXES.get(t, t)\n", " tU = tLookup.upper()\n", " if tU not in mapping:\n", " unmapped.add(t)\n", " continue\n", " targets = mapping[tU]\n", " if len(targets) == 1:\n", " unique[t] = list(targets)[0]\n", " else:\n", " multiple[t] = targets\n", " \n", "print(f'{len(unmapped):>3} unmapped tokens')\n", "print(f'{len(multiple):>3} ambiguously mapped tokens')\n", "print(f'{len(unique):>3} uniquely mapped tokens')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unmapped tokens" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "151 unmapped tokens\n" ] }, { "data": { "text/plain": [ "[\"'i\",\n", " 'ah',\n", " 'AH',\n", " 'alamusz',\n", " 'asal2',\n", " (1, 'asz'),\n", " (2, 'asz'),\n", " (3, 'asz'),\n", " (4, 'asz'),\n", " (5, 'asz'),\n", " (6, 'asz'),\n", " (7, 'asz'),\n", " (8, 'asz'),\n", " (9, 'asz'),\n", " 'babila2',\n", " (1, 'ban2'),\n", " (2, 'ban2'),\n", " (3, 'ban2'),\n", " (4, 'ban2'),\n", " (5, 'ban2'),\n", " 'barig',\n", " (1, 'barig'),\n", " (2, 'barig'),\n", " (3, 'barig'),\n", " (4, 'barig'),\n", " (5, 'barig'),\n", " (1, \"bur'u\"),\n", " (2, \"bur'u\"),\n", " (3, \"bur'u\"),\n", " (4, \"bur'u\"),\n", " (5, \"bur'u\"),\n", " (1, 'bur3'),\n", " (2, 'bur3'),\n", " (3, 'bur3'),\n", " (4, 'bur3'),\n", " (5, 'bur3'),\n", " (6, 'bur3'),\n", " (8, 'bur3'),\n", " (9, 'bur3'),\n", " 'dah',\n", " (1, 'disz'),\n", " ('1/2', 'disz'),\n", " ('1/3', 'disz'),\n", " (2, 'disz'),\n", " ('2/3', 'disz'),\n", " (3, 'disz'),\n", " (4, 'disz'),\n", " (5, 'disz'),\n", " ('5/6', 'disz'),\n", " (6, 'disz'),\n", " (7, 'disz'),\n", " (8, 'disz'),\n", " (9, 'disz'),\n", " 'duh',\n", " 'EH',\n", " 'eh',\n", " 'eri11',\n", " (1, 'esze3'),\n", " (2, 'esze3'),\n", " (3, 'esze3'),\n", " (1, 'gesz'),\n", " (9, 'gesz'),\n", " (1, \"gesz'u\"),\n", " (2, \"gesz'u\"),\n", " (3, \"gesz'u\"),\n", " (4, \"gesz'u\"),\n", " (7, \"gesz'u\"),\n", " (1, 'gesz2'),\n", " (2, 'gesz2'),\n", " (3, 'gesz2'),\n", " (4, 'gesz2'),\n", " (5, 'gesz2'),\n", " (6, 'gesz2'),\n", " (7, 'gesz2'),\n", " (8, 'gesz2'),\n", " (9, 'gesz2'),\n", " 'geszimmar',\n", " (2, 'gisz'),\n", " 'gudu4',\n", " 'HA',\n", " 'ha',\n", " 'had2',\n", " 'hal',\n", " 'har',\n", " 'HAR',\n", " 'he',\n", " 'he2',\n", " 'HI',\n", " 'hi',\n", " 'hu',\n", " 'HU',\n", " 'hub2',\n", " 'hun',\n", " 'hur',\n", " 'huz',\n", " 'ih',\n", " 'IH',\n", " (1, 'iku'),\n", " ('1/2', 'iku'),\n", " (2, 'iku'),\n", " (3, 'iku'),\n", " (4, 'iku'),\n", " 'itu',\n", " 'kislah',\n", " 'lah',\n", " 'lah4',\n", " 'lah5',\n", " 'lah6',\n", " 'lal3',\n", " 'm',\n", " 'mah',\n", " 'muhaldim',\n", " 'nigar',\n", " 'nirah',\n", " 'p',\n", " 'pesz2',\n", " 'sa10',\n", " 'sahar',\n", " 'siskur2',\n", " 'sz',\n", " 'szagina',\n", " 'szah',\n", " 'szah2',\n", " 'szandana',\n", " (1, 'szar2'),\n", " (2, 'szar2'),\n", " 'sze9',\n", " 'szii',\n", " 'szunigin',\n", " 'tah',\n", " 'tap',\n", " (1, 'u'),\n", " (2, 'u'),\n", " (3, 'u'),\n", " (4, 'u'),\n", " (5, 'u'),\n", " 'udru',\n", " 'uh',\n", " 'UH',\n", " 'UH2',\n", " 'uh2',\n", " 'UH3',\n", " 'uh3',\n", " 'ukken',\n", " 'umi',\n", " 'unu',\n", " 'ura',\n", " '|A.GAB.LISZ|',\n", " '|KA.TA|',\n", " '|UD.KIB.NU|',\n", " '|UD.KIB|']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unkey = lambda x: (x[1].lower(), str(x[0])) if type(x) is tuple else (x.lower(), '')\n", "\n", "print(f'{len(unmapped):>3} unmapped tokens')\n", "sorted(unmapped, key=unkey)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fix the unmapped tokens\n", "\n", "We look up the unmapped tokens in the unicode table." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "cuneiBlocks = {\n", " 'Cuneiform': ('12000', '123FF'),\n", " 'Cuneiform Numbers and Punctuation': ('12400', '1247F'),\n", " 'Early Dynastic Cuneiform': ('12480', '1254F'),\n", "}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "cunicode = {}\n", "\n", "for (block, (start, end)) in cuneiBlocks.items():\n", " for u in range(int(start, 16), int(end, 16) + 1):\n", " c = chr(u)\n", " name = uname(c, None)\n", " if name is None:\n", " continue\n", " cunicode[name] = c" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "fixed 67 out of 151\n", "FIXED\n", "\tasal2 => ๐’€ท\n", "\t(1, 'asz') => ๐’€ธ\n", "\t(1, 'disz') => ๐’น\n", "\t(2, 'disz') => ๐’น\n", "\tduh => ๐’‚ƒ\n", "\t(2, 'gisz') => ๐’„‘\n", "\tHA => ๐’„ฉ\n", "\tha => ๐’„ฉ\n", "\thal => ๐’„ฌ\n", "\tHI => ๐’„ญ\n", "\thi => ๐’„ญ\n", "\tHU => ๐’„ท\n", "\thu => ๐’„ท\n", "\thub2 => ๐’„ธ\n", "\t'i => ๐’„ฟ\n", "\tmah => ๐’ˆค\n", "\tpesz2 => ๐’‰พ\n", "\t(1, 'szar2') => ๐’Šน\n", "\t(1, 'u') => ๐’Œ‹\n", "\t(2, 'u') => ๐’Œ‹\n", "\t(3, 'u') => ๐’Œ‹\n", "\t(2, 'asz') => ๐’€\n", "\t(3, 'asz') => ๐’\n", "\t(4, 'asz') => ๐’‚\n", "\t(5, 'asz') => ๐’ƒ\n", "\t(6, 'asz') => ๐’„\n", "\t(7, 'asz') => ๐’…\n", "\t(8, 'asz') => ๐’†\n", "\t(9, 'asz') => ๐’‡\n", "\t(3, 'disz') => ๐’ˆ\n", "\t(4, 'disz') => ๐’‰\n", "\t(5, 'disz') => ๐’Š\n", "\t(6, 'disz') => ๐’‹\n", "\t(7, 'disz') => ๐’Œ\n", "\t(8, 'disz') => ๐’\n", "\t(9, 'disz') => ๐’Ž\n", "\t(4, 'u') => ๐’\n", "\t(5, 'u') => ๐’\n", "\t(1, 'gesz2') => ๐’•\n", "\t(2, 'gesz2') => ๐’–\n", "\t(3, 'gesz2') => ๐’—\n", "\t(4, 'gesz2') => ๐’˜\n", "\t(5, 'gesz2') => ๐’™\n", "\t(6, 'gesz2') => ๐’š\n", "\t(7, 'gesz2') => ๐’›\n", "\t(8, 'gesz2') => ๐’œ\n", "\t(9, 'gesz2') => ๐’\n", "\t(1, \"gesz'u\") => ๐’ž\n", "\t(2, \"gesz'u\") => ๐’Ÿ\n", "\t(3, \"gesz'u\") => ๐’ \n", "\t(4, \"gesz'u\") => ๐’ก\n", "\t(2, 'szar2') => ๐’ฃ\n", "\t(1, \"bur'u\") => ๐’ด\n", "\t(2, \"bur'u\") => ๐’ต\n", "\t(3, \"bur'u\") => ๐’ถ\n", "\t(4, \"bur'u\") => ๐’ธ\n", "\t(5, \"bur'u\") => ๐’น\n", "\t(1, 'ban2') => ๐’‘\n", "\t(2, 'ban2') => ๐’‘\n", "\t(3, 'ban2') => ๐’‘‘\n", "\t(4, 'ban2') => ๐’‘’\n", "\t(5, 'ban2') => ๐’‘”\n", "\t(1, 'esze3') => ๐’‘˜\n", "\t(2, 'esze3') => ๐’‘™\n", "\t('1/3', 'disz') => ๐’‘š\n", "\t('2/3', 'disz') => ๐’‘›\n", "\t('5/6', 'disz') => ๐’‘œ\n", "UNFIXED\n", "\tah => ?\n", "\tAH => ?\n", "\talamusz => ?\n", "\tbabila2 => ?\n", "\tbarig => ?\n", "\t(1, 'barig') => ?\n", "\t(2, 'barig') => ?\n", "\t(3, 'barig') => ?\n", "\t(4, 'barig') => ?\n", "\t(5, 'barig') => ?\n", "\t(1, 'bur3') => ?\n", "\t(2, 'bur3') => ?\n", "\t(3, 'bur3') => ?\n", "\t(4, 'bur3') => ?\n", "\t(5, 'bur3') => ?\n", "\t(6, 'bur3') => ?\n", "\t(8, 'bur3') => ?\n", "\t(9, 'bur3') => ?\n", "\tdah => ?\n", "\t('1/2', 'disz') => ?\n", "\tEH => ?\n", "\teh => ?\n", "\teri11 => ?\n", "\t(3, 'esze3') => ?\n", "\t(1, 'gesz') => ?\n", "\t(9, 'gesz') => ?\n", "\t(7, \"gesz'u\") => ?\n", "\tgeszimmar => ?\n", "\tgudu4 => ?\n", "\thad2 => ?\n", "\thar => ?\n", "\tHAR => ?\n", "\the => ?\n", "\the2 => ?\n", "\thun => ?\n", "\thur => ?\n", "\thuz => ?\n", "\tih => ?\n", "\tIH => ?\n", "\t(1, 'iku') => ?\n", "\t('1/2', 'iku') => ?\n", "\t(2, 'iku') => ?\n", "\t(3, 'iku') => ?\n", "\t(4, 'iku') => ?\n", "\titu => ?\n", "\tkislah => ?\n", "\tlah => ?\n", "\tlah4 => ?\n", "\tlah5 => ?\n", "\tlah6 => ?\n", "\tlal3 => ?\n", "\tm => ?\n", "\tmuhaldim => ?\n", "\tnigar => ?\n", "\tnirah => ?\n", "\tp => ?\n", "\tsa10 => ?\n", "\tsahar => ?\n", "\tsiskur2 => ?\n", "\tsz => ?\n", "\tszagina => ?\n", "\tszah => ?\n", "\tszah2 => ?\n", "\tszandana => ?\n", "\tsze9 => ?\n", "\tszii => ?\n", "\tszunigin => ?\n", "\ttah => ?\n", "\ttap => ?\n", "\tudru => ?\n", "\tuh => ?\n", "\tUH => ?\n", "\tUH2 => ?\n", "\tuh2 => ?\n", "\tUH3 => ?\n", "\tuh3 => ?\n", "\tukken => ?\n", "\tumi => ?\n", "\tunu => ?\n", "\tura => ?\n", "\t|A.GAB.LISZ| => ?\n", "\t|KA.TA| => ?\n", "\t|UD.KIB.NU| => ?\n", "\t|UD.KIB| => ?\n" ] } ], "source": [ "mapAddition = {}\n", "notFixed = set()\n", "\n", "def getLookup(r):\n", " return (\n", " r.\n", " replace(\"'\", '').\n", " upper().\n", " replace(\"SZ\", 'SH').\n", " replace('.', ' TIMES ')\n", " )\n", " \n", " \n", "for t in sorted(unmapped, key=unkey):\n", " if type(t) is tuple:\n", " if type(t[0]) is int:\n", " (repeat, r) = t\n", " tRepeat = REPEAT.get(repeat, None)\n", " if tRepeat is None:\n", " notFixed.add(t)\n", " continue\n", " tLookup = getLookup(r)\n", " name = f'CUNEIFORM NUMERIC SIGN {tRepeat.upper()} {tLookup}'\n", " c = cunicode.get(name, None)\n", " if c is not None:\n", " mapAddition[t] = c\n", " continue\n", " name = f'CUNEIFORM SIGN {tLookup}'\n", " else:\n", " (fraction, r) = t\n", " tFraction = FRACTION.get(fraction, None)\n", " if tFraction is None:\n", " notFixed.add(t)\n", " continue\n", " tLookup = getLookup(r)\n", " name = f'CUNEIFORM NUMERIC SIGN {tFraction.upper()} {tLookup}'\n", " else:\n", " tLookup = getLookup(t)\n", " name = f'CUNEIFORM SIGN {tLookup}'\n", " c = cunicode.get(name, None)\n", " if c is None:\n", " notFixed.add(t)\n", " else:\n", " mapAddition[t] = c\n", "\n", "print(f'fixed {len(mapAddition)} out of {len(unmapped)}')\n", "\n", "if mapAddition:\n", " print('FIXED')\n", " for (t, c) in sorted(mapAddition.items(), key=unkey):\n", " print(f'\\t{str(t):<15} => {c}')\n", "else:\n", " print('NOTHING FIXED')\n", " \n", "if notFixed:\n", " print('UNFIXED')\n", " for t in sorted(notFixed, key=unkey):\n", " print(f'\\t{str(t):<15} => ?')\n", "else:\n", " print('ALL FIXED')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Solutions\n", "\n", "Most of the remaining problems above got solved by a \n", "[table provided by Martijn Kokken](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/MartijnKokken.txt)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "m => ?\n", "n => ?\n", "p => ?\n", "sz => ?\n", "sze9 => ?\n", "szunigin => ?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ah': '๐’„ด',\n", " 'AH': '๐’„ด',\n", " 'alamusz': '๐’‹ญ',\n", " 'babila2': '๐’†๐’€ญ๐’Š',\n", " 'dah': '๐’ˆญ',\n", " 'eh': '๐’„ด',\n", " 'EH': '๐’„ด',\n", " 'eri11': '๐’€•',\n", " 'geszimmar': '๐’Šท',\n", " 'gudu4': '๐’„ด๐’ˆจ',\n", " 'had2': '๐’Œ“',\n", " 'har': '๐’„ฏ',\n", " 'HAR': '๐’„ฏ',\n", " 'he': '๐’„ญ',\n", " 'he2': '๐’ƒถ',\n", " 'hun': '๐’‚ ',\n", " 'hur': '๐’„ฏ',\n", " 'huz': '๐’ˆ',\n", " 'ih': '๐’„ด',\n", " 'IH': '๐’„ด',\n", " 'itu': '๐’Œ—',\n", " 'KA': '๐’…—๐’‹ซ',\n", " 'kislah': '๐’† ๐’Œ“',\n", " 'lah': '๐’Œ“',\n", " 'lah4': '๐’ป',\n", " 'lah5': '๐’บ๐’บ',\n", " 'lah6': '๐’บ',\n", " 'lal3': '๐’‹ญ',\n", " 'muhaldim': '๐’ˆฌ',\n", " 'nigar': '๐’Œ‹๐’Œ“๐’†ค',\n", " 'nirah': '๐’ˆฒ',\n", " 'sa10': '๐’‰š',\n", " 'sahar': '๐’…–',\n", " 'siskur2': '๐’€ฌ๐’€ฌ',\n", " 'szagina': '๐’„Š๐’€ด',\n", " 'szah': '๐’‹š',\n", " 'szah2': '๐’‚„',\n", " 'szandana': '๐’ƒฒ๐’‰Œ',\n", " 'tah': '๐’ˆญ',\n", " 'tap': '๐’‹ฐ',\n", " 'udru': '๐’€พ',\n", " 'UH': '๐’„ด',\n", " 'uh': '๐’„ด',\n", " 'UH2': '๐’Œ“๐’†ต',\n", " 'uh2': '๐’Œ“๐’†ต',\n", " 'uh3': '๐’†ต',\n", " 'UH3': '๐’†ต',\n", " 'ukken': '๐’Œบ',\n", " 'unu': '๐’€•',\n", " 'barig': '๐’น',\n", " '1(barig)': '๐’น',\n", " '2(barig)': '๐’น๐’น',\n", " '3(barig)': '๐’น๐’น๐’น',\n", " '4(barig)': '๐’',\n", " '5(barig)': '๐’„ฅ',\n", " 'bur3': '๐’Œ‹',\n", " \"bur'u\": '๐’ด',\n", " '1(bur3)': '๐’Œ‹',\n", " '2(bur3)': '๐’Œ‹๐’Œ‹',\n", " '3(bur3)': '๐’Œ‹๐’Œ‹๐’Œ‹',\n", " '4(bur3)': '๐’',\n", " '5(bur3)': '๐’',\n", " '6(bur3)': '๐’‘',\n", " '7(bur3)': '๐’’',\n", " '8(bur3)': '๐’“',\n", " '9(bur3)': '๐’”',\n", " '1/2(disz)': '๐’ˆฆ',\n", " '13(disz)': '๐’Ž™๐’ˆ',\n", " '1(iku)': '๐’€ธ',\n", " '2(iku)': '๐’€',\n", " '3(iku)': '๐’',\n", " '4(iku)': '๐’‚',\n", " '5(iku)': '๐’ƒ',\n", " '6(iku)': '๐’„',\n", " '7(iku)': '๐’…',\n", " '8(iku)': '๐’†',\n", " '9(iku)': '๐’‡',\n", " '3(esze3)': '๐’€ธ๐’Œ‹',\n", " 'gesz2': '๐’•',\n", " \"gesz'u\": '๐’ž',\n", " 'szar2': '๐’Šน'}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MAPPING_SOLUTIONS = dict(\n", " ah=('HIxNUN', 'U12134'),\n", " AH=('HIxNUN', 'U12134'),\n", " alamusz=('TAxHI', 'U122ED'),\n", " babila2=('KA2.AN.RA', 'U1218D U1202D U1228F'),\n", " dah=('MU/MU', 'U1222D'),\n", " eh=('HIxNUN', 'U12134'),\n", " EH=('HIxNUN', 'U12134'),\n", " eri11=('AB gunรป', 'U12015'),\n", " geszimmar=('ล A6', 'U122B7'),\n", " gudu4=('HIxNUN.ME', 'U12134 U12228'),\n", " had2=('UD', 'U12313'),\n", " har=('HIxAล 2', 'U1212F'),\n", " HAR=('HIxAล 2', 'U1212F'),\n", " he=('HI', 'U1212D'),\n", " he2=('GAN', 'U120F6'),\n", " hun=('Eล 2', 'U120A0'),\n", " hur=('HIxAล 2', 'U1212F'),\n", " huz=('LUM', 'U1221D'),\n", " ih=('HIxNUN', 'U12134'),\n", " IH=('HIxNUN', 'U12134'),\n", " itu=('UDxU.U.U', 'U12317'),\n", " KA=('KA TA', 'U12157 U122EB'),\n", " kislah=('KI.UD', 'U121A0 U12313'),\n", " lah=('UD', 'U12313'),\n", " lah4=('DU / DU', 'U1207B'),\n", " lah5=('DU.DU', 'U1207A U1207A'),\n", " lah6=('DU', 'U1207A'),\n", " lal3=('TAxHI', 'U122ED'),\n", " muhaldim=('MU', 'U1222C'),\n", " nigar=('U.UD.KID', 'U1230B U12313 U121A4'),\n", " nirah=('MUล ', 'U12232'),\n", " sa10=('NINDA2xล E', 'U1225A'),\n", " sahar=('Iล ', 'U12156'),\n", " siskur2=('AMARxล E.AMARxล E', 'U1202C U1202C'),\n", " szagina=('GIR3.ARAD', 'U1210A U12034'),\n", " szah=('ล UBUR', 'U122DA'),\n", " szah2=('DUN', 'U12084'),\n", " szandana=('GAL.NI', 'U120F2 U1224C'),\n", " tah=('MU/MU', 'U1222D'),\n", " tap=('TAB', 'U122F0'),\n", " udru=('Aล 2', 'U1203E'),\n", " UH=('HIxNUN', 'U12134'),\n", " uh=('HIxNUN', 'U12134'),\n", " UH2=('UD.KUล U2', 'U12313 U121B5'),\n", " uh2=('UD.KUล U2', 'U12313 U121B5'),\n", " uh3=('KUล U2', 'U121B5'),\n", " UH3=('KUล U2', 'U121B5'),\n", " ukken=('URUxBAR', 'U1233A'),\n", " unu=('AB gunรป', 'U12015'),\n", ")\n", "MAPPING_SOLUTIONS.update({\n", " 'barig': ('', 'U12079'),\n", " '1(barig)': ('', 'U12079'),\n", " '2(barig)': ('', 'U12079 U12079'),\n", " '3(barig)': ('', 'U12079 U12079 U12079'),\n", " '4(barig)': ('', 'U1235D'),\n", " '5(barig)': ('', 'U12125'),\n", " 'bur3': ('', 'U1230B'),\n", " \"bur'u\": ('', 'U12434'),\n", " '1(bur3)': ('', 'U1230B'),\n", " '2(bur3)': ('', 'U1230B U1230B'),\n", " '3(bur3)': ('', 'U1230B U1230B U1230B'),\n", " '4(bur3)': ('', 'U1240F'),\n", " '5(bur3)': ('', 'U12410'),\n", " '6(bur3)': ('', 'U12411'),\n", " '7(bur3)': ('', 'U12412'),\n", " '8(bur3)': ('', 'U12413'),\n", " '9(bur3)': ('', 'U12414'),\n", " '1/2(disz)': ('', 'U12226'),\n", " '13(disz)': ('', 'U12399 U12408'),\n", " '1(iku)': ('', 'U12038'),\n", " '2(iku)': ('', 'U12400'),\n", " '3(iku)': ('', 'U12401'),\n", " '4(iku)': ('', 'U12402'),\n", " '5(iku)': ('', 'U12403'),\n", " '6(iku)': ('', 'U12404'),\n", " '7(iku)': ('', 'U12405'),\n", " '8(iku)': ('', 'U12406'),\n", " '9(iku)': ('', 'U12407'),\n", " '3(esze3)': ('', 'U12038 U1230B'),\n", " 'gesz2': ('', 'U12415'),\n", " \"gesz'u\": ('', 'U1241E'),\n", " 'szar2': ('', 'U122B9'),\n", "})\n", "\n", "MAPPING_SOLUTIONSX = {}\n", "\n", "for (token, (grapheme, uniChars)) in MAPPING_SOLUTIONS.items():\n", " uniStr = ''.join(chr(int(uc[1:], 16)) for uc in uniChars.split())\n", " MAPPING_SOLUTIONSX[token] = uniStr\n", "MAPPING_SOLUTIONSX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Ambiguously mapped readings" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 50 ambiguously mapped readings\n", "IA => (2) => ๐’…€ - ๐’‰ฟ\n", "IL => (2) => ๐’€ง - ๐’…‹\n", "IRI => (2) => ๐’…• - ๐’Œท\n", "KAM => (2) => ๐’„ญ๐’ - ๐’„ฐ\n", "LUM => (2) => ๐’ˆ - ๐’‹ž\n", "USZ => (2) => ๐’‘ - ๐’–\n", "UZ => (2) => ๐’Šป - ๐’–\n", "WA => (2) => ๐’€ - ๐’‰ฟ\n", "ba4 => (3) => ๐’€€๐’€ญ๐’‚ท - ๐’‚ท - ๐’๐’‚ท๐’‚ท\n", "ba6 => (2) => ๐’€๐’Œ‘ - ๐’Œ‘\n", "bara2 => (2) => ๐’ - ๐’ˆ\n", "bum => (2) => ๐’…ค - ๐’†ƒ\n", "buru14 => (2) => ๐’‚˜ - ๐’‚™\n", "dabin => (2) => ๐’‚ ๐’Šบ - ๐’ฅ๐’Šบ\n", "dilmun => (3) => ๐’‰Œ๐’Œ‡ - ๐’Šฉ๐’„ธ - ๐’Šฉ๐’Œ‡\n", "eri => (2) => ๐’…• - ๐’Œท\n", "erisz => (2) => ๐’Šฉ๐’ˆ  - ๐’Šฉ๐’Œ†\n", "gala => (3) => ๐’ƒฒ - ๐’‘๐’†ช - ๐’“\n", "gin7 => (2) => ๐’ถ - ๐’„€\n", "gurusz => (2) => ๐’„จ - ๐’†—\n", "ia => (2) => ๐’…€ - ๐’‰ฟ\n", "idim => (2) => ๐’ - ๐’…‚\n", "ii => (2) => ๐’…€ - ๐’‰ฟ\n", "il => (2) => ๐’€ง - ๐’…‹\n", "iri => (2) => ๐’…• - ๐’Œท\n", "isz8 => (2) => ๐’€น - ๐’Œ‹\n", "iu => (2) => ๐’…€ - ๐’‰ฟ\n", "kam => (2) => ๐’„ญ๐’ - ๐’„ฐ\n", "kesz2 => (2) => ๐’‚ก - ๐’†Ÿ\n", "kesz3 => (2) => ๐’‹™๐’€ญ๐’„ฒ - ๐’‹™๐’€ญ๐’„ฒ๐’† \n", "lum => (2) => ๐’ˆ - ๐’‹ž\n", "munu4 => (2) => ๐’‰ฝ๐’‰ฝ - ๐’‰ฝ๐’Šบ๐’‰ฝ\n", "ne3 => (2) => ๐’„Š - ๐’ŠŠ\n", "nergal => (2) => ๐’„Š๐’€•๐’ƒฒ - ๐’ŠŠ๐’€•๐’ƒฒ\n", "pa2 => (2) => ๐’€ - ๐’€๐’Œ‘\n", "pirig => (2) => ๐’„Š - ๐’ŠŠ\n", "puzur4 => (2) => ๐’…ค๐’Šญ - ๐’†ƒ๐’Šญ\n", "sig17 => (2) => ๐’„€ - ๐’†ฌ\n", "sze20 => (2) => ๐’‚  - ๐’…†\n", "t,a2 => (2) => ๐’‹ซ - ๐’‹ฌ\n", "til => (2) => ๐’ - ๐’Œ€\n", "us => (2) => ๐’Šป - ๐’–\n", "usa => (2) => ๐’ - ๐’‘„\n", "usz => (2) => ๐’‘ - ๐’–\n", "usz2 => (2) => ๐’ - ๐’—\n", "uz => (2) => ๐’Šป - ๐’–\n", "wa => (2) => ๐’€ - ๐’‰ฟ\n", "wa2 => (2) => ๐’€ - ๐’‰Œ\n", "zi3 => (2) => ๐’‚  - ๐’ฅ\n", "ziz2 => (2) => ๐’€พ - ๐’ฉ\n" ] } ], "source": [ "print(f'{len(multiple):>3} ambiguously mapped readings')\n", "for r in sorted(multiple):\n", " unis = multiple[r]\n", " uniStr = ' - '.join(sorted(unis))\n", " print(f'{r} => ({len(unis)}) => {uniStr}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Uniquely mapped readings" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "768 uniquely mapped readings\n", " A => ๐’€€\n", " AB => ๐’€Š\n", " AD => ๐’€œ\n", " AG => ๐’€\n", " AK => ๐’€\n", " AL => ๐’€ \n", " AM => ๐’„ \n", " AN => ๐’€ญ\n", " AR => ๐’…ˆ\n", " ARAD => ๐’€ด\n", " ARAD2 => ๐’€ต\n", " AS, => ๐’Š\n", " AS2 => ๐’€พ\n", " ASZ => ๐’€ธ\n", " AZ => ๐’Š\n", " BA => ๐’€\n", " BAD => ๐’\n", " BAR => ๐’‡\n", " BE => ๐’\n", " BI => ๐’‰\n", " BU => ๐’\n", " BUR => ๐’“\n", " DA => ๐’•\n", " DAM => ๐’ฎ\n", " DI => ๐’ฒ\n", " DIM => ๐’ด\n", " DIN => ๐’ท\n", " DISZ => ๐’น\n", " DU => ๐’บ\n", " E => ๐’‚Š\n", " EDIN => ๐’‚”\n", " EK => ๐’……\n", " EL => ๐’‚–\n", " ER => ๐’…•\n", " GA => ๐’‚ต\n", " GAG => ๐’†•\n", " GAL => ๐’ƒฒ\n", " GAN2 => ๐’ƒท\n", " GAR => ๐’ƒป\n", " GAZ => ๐’„ค\n", " GESZ => ๐’„‘\n", " GI => ๐’„€\n", " GIR => ๐’„ซ\n", " GIR2 => ๐’„ˆ\n", " GU => ๐’„–\n", " I => ๐’„ฟ\n", " IB => ๐’…\n", " ID => ๐’€‰\n", " IG => ๐’……\n", " IK => ๐’……\n", " IL2 => ๐’…\n", " IM => ๐’…Ž\n", " IN => ๐’…”\n", " IR => ๐’…•\n", " ISZ => ๐’…–\n", " IZ => ๐’„‘\n", " KA => ๐’…—\n", " KAB => ๐’†\n", " KI => ๐’† \n", " KIB => ๐’„’\n", " KU => ๐’†ช\n", " KUM => ๐’„ฃ\n", " KUR => ๐’†ณ\n", " LA => ๐’†ท\n", " LAM => ๐’‡ด\n", " LE => ๐’‡ท\n", " LI => ๐’‡ท\n", " LU => ๐’‡ป\n", " LU2 => ๐’‡ฝ\n", " MA => ๐’ˆ \n", " ME => ๐’ˆจ\n", " MI => ๐’ˆช\n", " NA => ๐’ˆพ\n", " NAM => ๐’‰†\n", " NE => ๐’‰ˆ\n", " NI => ๐’‰Œ\n", " NIG2 => ๐’ƒป\n", " NIM => ๐’‰\n", " NIN => ๐’Šฉ๐’Œ†\n", " NU => ๐’‰ก\n", " NUN => ๐’‰ฃ\n", " PA => ๐’‰บ\n", " PI => ๐’‰ฟ\n", " RA => ๐’Š\n", " RI => ๐’Š‘\n", " RU => ๐’Š’\n", " S,I => ๐’ข\n", " SA => ๐’Š“\n", " SAG => ๐’Š•\n", " SAR => ๐’Šฌ\n", " SIG => ๐’‹\n", " SU => ๐’‹ข\n", " SZA => ๐’Šญ\n", " SZE => ๐’Šบ\n", " SZE3 => ๐’‚ \n", " SZESZ => ๐’‹€\n", " SZI => ๐’…†\n", " SZIM => ๐’‹†\n", " SZIR => ๐’‹“\n", " SZU => ๐’‹—\n", " TA => ๐’‹ซ\n", " TAB => ๐’‘Š\n", " TAM => ๐’Œ“\n", " TAR => ๐’‹ป\n", " TE => ๐’‹ผ\n", " TI => ๐’‹พ\n", " TIM => ๐’ด\n", " TU => ๐’Œ…\n", " TUG2 => ๐’Œ†\n", " TUL2 => ๐’‡ฅ\n", " TUM => ๐’Œˆ\n", " TUR => ๐’Œ‰\n", " U2 => ๐’Œ‘\n", " U3 => ๐’…‡\n", " U4 => ๐’Œ“\n", " UB => ๐’Œ’\n", " UD => ๐’Œ“\n", " UG => ๐’ŠŒ\n", " UK => ๐’ŠŒ\n", " UL => ๐’ŒŒ\n", " UM => ๐’Œ\n", " UR => ๐’Œจ\n", " WE => ๐’‰ฟ\n", " WI => ๐’‰ฟ\n", " ZA => ๐’\n", " ZE => ๐’ฃ\n", " ZI => ๐’ฃ\n", " ZI2 => ๐’ข\n", " ZU => ๐’ช\n", " ZUM => ๐’ฎ\n", " a => ๐’€€\n", " a2 => ๐’€‰\n", " ab => ๐’€Š\n", " ab2 => ๐’€–\n", " abul => ๐’†๐’ƒฒ\n", " abzu => ๐’ช๐’€Š\n", " ad => ๐’€œ\n", " adab => ๐’Œ“๐’‰ฃ\n", " ag => ๐’€\n", " ag2 => ๐’‰˜\n", " aga => ๐’‚†\n", " agrig => ๐’…†๐’พ\n", " ak => ๐’€\n", " akszak => ๐’Œ”\n", " al => ๐’€ \n", " alam => ๐’€ฉ\n", " alan => ๐’€ฉ\n", " am => ๐’„ \n", " am3 => ๐’€€๐’€ญ\n", " ama => ๐’‚ผ\n", " amar => ๐’€ซ\n", " an => ๐’€ญ\n", " ansze => ๐’„\n", " ap => ๐’€Š\n", " apin => ๐’€ณ\n", " aq => ๐’€\n", " ar => ๐’…ˆ\n", " ar3 => ๐’„ฏ\n", " as => ๐’Š\n", " as, => ๐’Š\n", " as2 => ๐’€พ\n", " asal => ๐’‚\n", " asar => ๐’‚\n", " asz => ๐’€ธ\n", " asz2 => ๐’€พ\n", " asza5 => ๐’ƒท\n", " aszgab => ๐’€ฟ\n", " asznan => ๐’Šบ๐’Œ\n", " at => ๐’€œ\n", " at, => ๐’€œ\n", " az => ๐’Š\n", " az2 => ๐’€พ\n", " az3 => ๐’€ธ\n", " azlag2 => ๐’Œ†\n", " ba => ๐’€\n", " babbar => ๐’Œ“\n", " bad3 => ๐’‚ฆ\n", " bal => ๐’„\n", " bala => ๐’„\n", " ban => ๐’‰ผ\n", " ban2 => ๐’‘\n", " ban3 => ๐’Œ‰\n", " banda3 => ๐’Œ‰\n", " banesz => ๐’‘‘\n", " banszur => ๐’Ž\n", " bappir => ๐’‹‹\n", " bar => ๐’‡\n", " bat => ๐’\n", " be => ๐’\n", " be2 => ๐’‰\n", " bi => ๐’‰\n", " bi2 => ๐’‰ˆ\n", " bil => ๐’‰ˆ\n", " bil2 => ๐’‰‹\n", " bir2 => ๐’Œ“\n", " bir4 => ๐’‚”\n", " bisz => ๐’„ซ\n", " bu => ๐’\n", " bun2 => ๐’…ฎ\n", " bur => ๐’“\n", " bur3 => ๐’Œ‹\n", " buranun => ๐’Œ“๐’„’๐’‰ฃ\n", " d => ๐’€ญ\n", " da => ๐’•\n", " dab => ๐’ณ\n", " dab5 => ๐’†ช\n", " dag => ๐’–\n", " dagal => ๐’‚ผ\n", " dam => ๐’ฎ\n", " dan => ๐’„จ\n", " daq => ๐’–\n", " dar => ๐’ฏ\n", " de => ๐’ฒ\n", " de3 => ๐’‰ˆ\n", " de4 => ๐’‹ผ\n", " di => ๐’ฒ\n", " di2 => ๐’Šน\n", " di3 => ๐’‹พ\n", " dib => ๐’ณ\n", " dida => ๐’‰๐’Œ‘๐’Š“\n", " didli => ๐’€\n", " dil => ๐’€ธ\n", " dim => ๐’ด\n", " dim2 => ๐’ถ\n", " dim4 => ๐’‰ฝ๐’‰ฝ\n", " din => ๐’ท\n", " dingir => ๐’€ญ\n", " diri => ๐’‹›๐’€€\n", " dirig => ๐’‹›๐’€€\n", " disz => ๐’น\n", " du => ๐’บ\n", " du10 => ๐’„ญ\n", " du11 => ๐’…—\n", " du3 => ๐’†•\n", " du5 => ๐’‚…\n", " du6 => ๐’‡ฏ\n", " du7 => ๐’ŒŒ\n", " du8 => ๐’‚ƒ\n", " dub => ๐’พ\n", " dug => ๐’‚\n", " dug3 => ๐’„ญ\n", " dul3 => ๐’Šจ\n", " dul5 => ๐’Œ†\n", " dumu => ๐’Œ‰\n", " duru5 => ๐’€€\n", " dusu => ๐’…\n", " dusu2 => ๐’€ฒ๐’…‡\n", " e => ๐’‚Š\n", " e2 => ๐’‚\n", " e3 => ๐’Œ“๐’บ\n", " ea => ๐’€€\n", " eb => ๐’…\n", " ed => ๐’€‰\n", " edin => ๐’‚”\n", " eg => ๐’……\n", " egir => ๐’‚•\n", " ek => ๐’……\n", " el => ๐’‚–\n", " el2 => ๐’…‹\n", " el3 => ๐’€ญ\n", " elam => ๐’‰\n", " em => ๐’…Ž\n", " eme => ๐’…ด\n", " eme6 => ๐’€ฒ๐’Šฉ\n", " en => ๐’‚—\n", " en6 => ๐’…”\n", " engar => ๐’€ณ\n", " enku => ๐’ ๐’„ฉ\n", " ensi2 => ๐’‘๐’‹ผ๐’‹›\n", " ep => ๐’…\n", " eq => ๐’……\n", " er => ๐’…•\n", " er2 => ๐’€€๐’…†\n", " er3 => ๐’€ด\n", " eren2 => ๐’‚Ÿ\n", " eresz2 => ๐’‰€\n", " erim => ๐’‚Ÿ\n", " erin => ๐’‚ž\n", " erin2 => ๐’‚Ÿ\n", " es => ๐’„‘\n", " es, => ๐’„‘\n", " esir => ๐’€€๐’‡’\n", " esz => ๐’Œ\n", " esz15 => ๐’…–\n", " esz18 => ๐’€น\n", " esz2 => ๐’‚ \n", " esz3 => ๐’€Š\n", " esza => ๐’€€๐’Œ\n", " esze3 => ๐’‘˜\n", " et => ๐’€‰\n", " et, => ๐’€‰\n", " ez => ๐’„‘\n", " ezem => ๐’‚ก\n", " ga => ๐’‚ต\n", " ga2 => ๐’‚ท\n", " gab => ๐’ƒฎ\n", " gaba => ๐’ƒฎ\n", " gada => ๐’ƒฐ\n", " gag => ๐’†•\n", " gal => ๐’ƒฒ\n", " gal2 => ๐’……\n", " gan => ๐’ƒถ\n", " gan2 => ๐’ƒท\n", " ganba => ๐’† ๐’‡ด\n", " gar => ๐’ƒป\n", " gar3 => ๐’ƒผ\n", " gaz => ๐’„ค\n", " ge => ๐’„€\n", " ge6 => ๐’ˆช\n", " geme => ๐’Šฉ\n", " geme2 => ๐’Šฉ๐’†ณ\n", " gesz => ๐’„‘\n", " gesztin => ๐’ƒพ\n", " gesztu2 => ๐’„‘๐’Œ†๐’‰ฟ\n", " gi => ๐’„€\n", " gi2 => ๐’†ค\n", " gi4 => ๐’„„\n", " gi7 => ๐’‚ \n", " gibil => ๐’‰‹\n", " gid2 => ๐’\n", " gidri => ๐’‰บ\n", " gigir => ๐’‡€\n", " gim => ๐’ถ\n", " gin => ๐’บ\n", " gin2 => ๐’‚…\n", " gir => ๐’„ซ\n", " gir14 => ๐’„ฉ\n", " gir2 => ๐’„ˆ\n", " gir3 => ๐’„Š\n", " gir8 => ๐’†ธ\n", " giri17 => ๐’…—\n", " giri3 => ๐’„Š\n", " gissu => ๐’„‘๐’ˆช\n", " gisz => ๐’„‘\n", " gu => ๐’„–\n", " gu2 => ๐’„˜\n", " gu4 => ๐’„ž\n", " gu7 => ๐’…ฅ\n", " gub => ๐’บ\n", " gud => ๐’„ž\n", " gul => ๐’„ข\n", " gum2 => ๐’ˆ\n", " gur => ๐’„ฅ\n", " gur10 => ๐’†ฅ\n", " gur11 => ๐’‚ต\n", " gur8 => ๐’‹ฝ\n", " guru7 => ๐’„ฆ\n", " i => ๐’„ฟ\n", " i3 => ๐’‰Œ\n", " i7 => ๐’€€๐’‡‰\n", " ia2 => ๐’Š\n", " ia3 => ๐’‰Œ\n", " ib => ๐’…\n", " ib2 => ๐’Œˆ\n", " ibila => ๐’Œ‰๐’‘\n", " id => ๐’€‰\n", " id2 => ๐’€€๐’‡‰\n", " idigna => ๐’ˆฆ๐’„˜๐’ƒผ\n", " ig => ๐’……\n", " igi => ๐’…†\n", " ik => ๐’……\n", " iku => ๐’ƒท\n", " il2 => ๐’…\n", " il3 => ๐’€ญ\n", " il5 => ๐’‚–\n", " illat => ๐’†œ๐’†ณ\n", " im => ๐’…Ž\n", " imin => ๐’‘‚\n", " imma3 => ๐’…Š\n", " in => ๐’…”\n", " ina => ๐’€ธ\n", " inanna => ๐’ˆน\n", " inim => ๐’…—\n", " ip => ๐’…\n", " iq => ๐’……\n", " ir => ๐’…•\n", " ir3 => ๐’€ด\n", " is => ๐’„‘\n", " is, => ๐’„‘\n", " is2 => ๐’…–\n", " is3 => ๐’€Š\n", " is4 => ๐’€พ\n", " isz => ๐’…–\n", " isz3 => ๐’Œ\n", " isz7 => ๐’€Š\n", " iszkur => ๐’…Ž\n", " isztaran => ๐’…—๐’ฒ\n", " it => ๐’€‰\n", " it, => ๐’€‰\n", " iti => ๐’Œš\n", " iz => ๐’„‘\n", " ka => ๐’…—\n", " ka2 => ๐’†\n", " ka3 => ๐’‚ต\n", " ka9 => ๐’‹ƒ\n", " kab => ๐’†\n", " kak => ๐’†•\n", " kal => ๐’„จ\n", " kal2 => ๐’ƒฒ\n", " kalag => ๐’„จ\n", " kalam => ๐’Œฆ\n", " kap => ๐’†\n", " kar => ๐’‹ผ๐’€€\n", " kar2 => ๐’ƒธ\n", " kar3 => ๐’ƒผ\n", " kas4 => ๐’ฝ\n", " kaskal => ๐’†œ\n", " kasz => ๐’‰\n", " ke => ๐’† \n", " ke4 => ๐’†ค\n", " ki => ๐’† \n", " ki2 => ๐’„€\n", " kid => ๐’†ค\n", " kikken2 => ๐’„ฏ๐’„ฏ\n", " kilib => ๐’†ธ\n", " kin => ๐’†ฅ\n", " kir => ๐’„ซ\n", " kiri6 => ๐’Šฌ\n", " kisz => ๐’†ง\n", " kiszib => ๐’ˆฉ\n", " kiszib3 => ๐’พ\n", " ku => ๐’†ช\n", " ku13 => ๐’„ฃ\n", " ku3 => ๐’†ฌ\n", " ku4 => ๐’†ฎ\n", " ku5 => ๐’‹ป\n", " ku6 => ๐’„ฉ\n", " kul => ๐’†ฐ\n", " kum => ๐’„ฃ\n", " kun => ๐’†ฒ\n", " kup4 => ๐’†ค\n", " kur => ๐’†ณ\n", " kur2 => ๐’‰ฝ\n", " kurun2 => ๐’ท\n", " kuruszda => ๐’†ฏ\n", " kusz => ๐’‹ข\n", " kusz3 => ๐’Œ‘\n", " la => ๐’†ท\n", " la2 => ๐’‡ฒ\n", " lagasz => ๐’‹“๐’“๐’†ท\n", " lam => ๐’‡ด\n", " lamma => ๐’„จ\n", " larsa => ๐’Œ“๐’€•\n", " le => ๐’‡ท\n", " lem => ๐’…†\n", " li => ๐’‡ท\n", " li2 => ๐’‰Œ\n", " li3 => ๐’…†\n", " libir => ๐’…†๐’‚ \n", " lik => ๐’Œจ\n", " lil2 => ๐’†ค\n", " lim => ๐’…†\n", " lu => ๐’‡ป\n", " lu2 => ๐’‡ฝ\n", " lu4 => ๐’ˆ\n", " lugal => ๐’ˆ—\n", " lukur => ๐’Šฉ๐’ˆจ\n", " ma => ๐’ˆ \n", " ma2 => ๐’ˆฃ\n", " mal => ๐’‚ท\n", " man => ๐’Œ‹๐’Œ‹\n", " mar => ๐’ˆฅ\n", " mar2 => ๐’€ซ\n", " marduk => ๐’€ซ๐’Œ“\n", " masz => ๐’ˆฆ\n", " masz2 => ๐’ˆง\n", " maszkim => ๐’‘๐’ฝ\n", " me => ๐’ˆจ\n", " me2 => ๐’ˆช\n", " mesz => ๐’ˆจ๐’Œ\n", " mi => ๐’ˆช\n", " mi2 => ๐’Šฉ\n", " mi3 => ๐’ˆจ\n", " mil => ๐’…–\n", " mu => ๐’ˆฌ\n", " mug => ๐’ˆฎ\n", " mun => ๐’ต\n", " munus => ๐’Šฉ\n", " mur => ๐’„ฏ\n", " musz => ๐’ˆฒ\n", " musz5 => ๐’‹€\n", " muszen => ๐’„ท\n", " na => ๐’ˆพ\n", " na4 => ๐’‰Œ๐’Œ“\n", " nag2 => ๐’‰€\n", " nagar => ๐’‰„\n", " nagga => ๐’€ญ๐’ˆพ\n", " nam => ๐’‰†\n", " nanna => ๐’Œถ๐’† \n", " nansze => ๐’€\n", " nar => ๐’ˆœ\n", " ne => ๐’‰ˆ\n", " ne2 => ๐’‰Œ\n", " ni => ๐’‰Œ\n", " nibru => ๐’‚—๐’†ค\n", " nidba2 => ๐’‰ป๐’ˆน\n", " nig2 => ๐’ƒป\n", " nigin3 => ๐’Œ‹๐’Œ“๐’†ค\n", " nigin6 => ๐’€’\n", " nim => ๐’‰\n", " nimgir => ๐’‚†\n", " nin => ๐’Šฉ๐’Œ†\n", " nina => ๐’€\n", " ninda => ๐’ƒป\n", " nir => ๐’‰ช\n", " nita => ๐’‘\n", " nita2 => ๐’€ด\n", " nu => ๐’‰ก\n", " nu2 => ๐’ˆฟ\n", " num => ๐’‰\n", " numun => ๐’†ฐ\n", " nun => ๐’‰ฃ\n", " nunuz => ๐’‰ญ\n", " pa => ๐’‰บ\n", " pa12 => ๐’‰ฟ\n", " pa3 => ๐’…†๐’Š’\n", " pa4 => ๐’‰ฝ\n", " pa5 => ๐’‰ฝ๐’‚Š\n", " pal => ๐’„\n", " par2 => ๐’‡\n", " pe => ๐’‰ฟ\n", " pe2 => ๐’‰\n", " pesz => ๐’„ซ\n", " pi => ๐’‰ฟ\n", " pi2 => ๐’‰\n", " pi4 => ๐’…—\n", " pil => ๐’‰ˆ\n", " pil2 => ๐’‰‹\n", " pir => ๐’Œ“\n", " pisan => ๐’‚ท\n", " pisz => ๐’„ซ\n", " pu => ๐’\n", " pur => ๐’“\n", " puzur => ๐’Œ‹\n", " qa => ๐’‹ก\n", " qa2 => ๐’‚ต\n", " qa3 => ๐’…—\n", " qal4 => ๐’„จ\n", " qar => ๐’ƒผ\n", " qar3 => ๐’ƒป\n", " qe => ๐’†ฅ\n", " qe2 => ๐’† \n", " qe3 => ๐’„€\n", " qi => ๐’†ฅ\n", " qi2 => ๐’† \n", " qi3 => ๐’„€\n", " qi4 => ๐’„„\n", " qir => ๐’„ซ\n", " qu => ๐’„ฃ\n", " qu2 => ๐’†ช\n", " qu3 => ๐’„–\n", " qum => ๐’„ฃ\n", " qur2 => ๐’†ณ\n", " ra => ๐’Š\n", " ra2 => ๐’บ\n", " rasz => ๐’†œ\n", " re => ๐’Š‘\n", " ri => ๐’Š‘\n", " ri2 => ๐’Œท\n", " rim5 => ๐’€ธ\n", " ru => ๐’Š’\n", " ru3 => ๐’€ธ\n", " rum => ๐’€ธ\n", " s,a => ๐’\n", " s,ar => ๐’‡ก\n", " s,e => ๐’ข\n", " s,e2 => ๐’ฃ\n", " s,i => ๐’ข\n", " s,i2 => ๐’ฃ\n", " s,il2 => ๐’ˆช\n", " s,ir => ๐’ˆฒ\n", " s,u => ๐’ฎ\n", " s,u2 => ๐’ช\n", " s,um => ๐’ฎ\n", " s,ur => ๐’€ซ\n", " sa => ๐’Š“\n", " sa12 => ๐’Š•\n", " sa2 => ๐’ฒ\n", " sa3 => ๐’\n", " sa6 => ๐’Šท\n", " sag => ๐’Š•\n", " sag11 => ๐’†ฅ\n", " saga => ๐’Š•\n", " sak => ๐’Š•\n", " sal => ๐’Šฉ\n", " sanga => ๐’‹ƒ\n", " sar => ๐’Šฌ\n", " se => ๐’‹›\n", " se2 => ๐’ฃ\n", " se3 => ๐’‹ง\n", " si => ๐’‹›\n", " si2 => ๐’ฃ\n", " si3 => ๐’‹ง\n", " si4 => ๐’‹œ\n", " sig => ๐’‹\n", " sig2 => ๐’‹ \n", " sig4 => ๐’‹ž\n", " siki => ๐’‹ \n", " sikil => ๐’‚–\n", " sila => ๐’‹ป\n", " sila3 => ๐’‹ก\n", " sila4 => ๐’ƒข\n", " silig => ๐’‚\n", " silim => ๐’ฒ\n", " sim => ๐’‰†\n", " simug => ๐’Œฃ\n", " sin => ๐’Œ\n", " sin2 => ๐’‰†\n", " sipa => ๐’‰บ๐’‡ป\n", " sipad => ๐’‰บ๐’‡ป\n", " sir2 => ๐’\n", " sir3 => ๐’‚ก\n", " su => ๐’‹ข\n", " su2 => ๐’ช\n", " su3 => ๐’‹ค\n", " su7 => ๐’‡ญ\n", " suen => ๐’‚—๐’ช\n", " sukkal => ๐’ˆ›\n", " sum => ๐’‹ง\n", " sum2 => ๐’ฎ\n", " sun2 => ๐’„ข\n", " sur => ๐’‹ฉ\n", " sza => ๐’Šญ\n", " sza13 => ๐’Šน\n", " sza3 => ๐’Šฎ\n", " szabra => ๐’‘๐’€ \n", " szakkan2 => ๐’„Š\n", " szam => ๐’Œ‘\n", " szam3 => ๐’‰“\n", " szar => ๐’Šฌ\n", " szara2 => ๐’‡‹\n", " sze => ๐’Šบ\n", " sze3 => ๐’‚ \n", " szen => ๐’Šฟ\n", " szennur => ๐’„’\n", " szesz => ๐’‹€\n", " szi => ๐’…†\n", " szi2 => ๐’‹›\n", " szim => ๐’‹†\n", " szinig => ๐’‹’\n", " szitim => ๐’ถ\n", " szu => ๐’‹—\n", " szub => ๐’Š’\n", " szubur => ๐’‹š\n", " szuku => ๐’‰ป\n", " szul => ๐’‚„\n", " szum => ๐’‹ณ\n", " szum2 => ๐’‹ง\n", " szur => ๐’‹ฉ\n", " szur4 => ๐’‡ณ๐’Šฌ\n", " szusz3 => ๐’…–\n", " t,a => ๐’•\n", " t,a3 => ๐’„ญ\n", " t,am => ๐’ฎ\n", " t,ar => ๐’‹ป\n", " t,e => ๐’ฒ\n", " t,e4 => ๐’‹ผ\n", " t,e6 => ๐’‹พ\n", " t,i => ๐’ฒ\n", " t,i3 => ๐’‹พ\n", " t,u => ๐’‚…\n", " t,u2 => ๐’Œ…\n", " t,u3 => ๐’บ\n", " t,ul => ๐’‡ฅ\n", " t,um => ๐’Œˆ\n", " t,up => ๐’พ\n", " ta => ๐’‹ซ\n", " ta2 => ๐’•\n", " tab => ๐’‘Š\n", " tak => ๐’‹ณ\n", " tak2 => ๐’–\n", " tak4 => ๐’‹บ\n", " taka4 => ๐’‹บ\n", " tal => ๐’Š‘\n", " tam => ๐’Œ“\n", " tam2 => ๐’ฎ\n", " tar => ๐’‹ป\n", " tar2 => ๐’ฏ\n", " taskarin => ๐’Œ†\n", " tasz => ๐’Œจ\n", " te => ๐’‹ผ\n", " te4 => ๐’‰ˆ\n", " te9 => ๐’‹พ\n", " tel => ๐’\n", " ter => ๐’Œ\n", " ti => ๐’‹พ\n", " ti7 => ๐’‹ผ\n", " tibira => ๐’๐’‰„\n", " tim => ๐’ด\n", " tir => ๐’Œ\n", " tiszpak => ๐’ˆฝ\n", " tu => ๐’Œ…\n", " tu2 => ๐’Œ“\n", " tu3 => ๐’บ\n", " tug2 => ๐’Œ†\n", " tukul => ๐’†ช\n", " tul => ๐’Œ‹๐’Œ†\n", " tul2 => ๐’‡ฅ\n", " tum => ๐’Œˆ\n", " tun3 => ๐’‚…\n", " tup => ๐’พ\n", " tur => ๐’Œ‰\n", " tur2 => ๐’„™\n", " u => ๐’Œ‹\n", " u2 => ๐’Œ‘\n", " u3 => ๐’…‡\n", " u4 => ๐’Œ“\n", " u8 => ๐’‡‡\n", " ub => ๐’Œ’\n", " ud => ๐’Œ“\n", " ud5 => ๐’š\n", " udu => ๐’‡ป\n", " ug => ๐’ŠŒ\n", " ug3 => ๐’Œฆ\n", " ugula => ๐’‰บ\n", " uk => ๐’ŠŒ\n", " ul => ๐’ŒŒ\n", " um => ๐’Œ\n", " umbin => ๐’Œข\n", " umma => ๐’„‘๐’†ต\n", " un => ๐’Œฆ\n", " unken => ๐’Œบ\n", " unug => ๐’€•\n", " up => ๐’Œ’\n", " uq => ๐’ŠŒ\n", " ur => ๐’Œจ\n", " ur2 => ๐’Œซ\n", " ur3 => ๐’ƒก\n", " ur5 => ๐’„ฏ\n", " urasz => ๐’…\n", " uri2 => ๐’‹€๐’€•\n", " urta => ๐’…\n", " uru => ๐’Œท\n", " uru4 => ๐’€ณ\n", " uruda => ๐’\n", " urudu => ๐’\n", " us, => ๐’Šป\n", " us,2 => ๐’‘\n", " us,4 => ๐’Š\n", " us2 => ๐’‘\n", " usan3 => ๐’‰ฎ\n", " ut => ๐’Œ“\n", " ut, => ๐’Œ“\n", " utu => ๐’Œ“\n", " uz2 => ๐’‘\n", " uzu => ๐’œ\n", " we => ๐’‰ฟ\n", " wi => ๐’‰ฟ\n", " wu => ๐’‰ฟ\n", " yi => ๐’‰ฟ\n", " za => ๐’\n", " za3 => ๐’ \n", " zabala4 => ๐’ˆน๐’๐’€•\n", " zabar => ๐’Œ“๐’…—๐’ˆฆ\n", " zadim => ๐’ˆฏ\n", " zal => ๐’‰Œ\n", " zalag2 => ๐’‚Ÿ\n", " zar => ๐’‡ก\n", " ze => ๐’ฃ\n", " ze2 => ๐’ข\n", " zi => ๐’ฃ\n", " zi2 => ๐’ข\n", " zid2 => ๐’‚ \n", " zimbir => ๐’Œ“๐’„’๐’‰ฃ\n", " zir3 => ๐’ˆฒ\n", " zu => ๐’ช\n", " zu2 => ๐’…—\n", " zum => ๐’ฎ\n" ] } ], "source": [ "print(f'{len(unique):>3} uniquely mapped readings')\n", "for r in sorted(unique):\n", " print(f'{r:>10} => {unique[r]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Write the mapping file" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "964 entries written to /Users/dirk/github/Nino-cunei/oldbabylonian/characters/mapping.tsv\n" ] } ], "source": [ "pairs = {}\n", "for (k, vs) in multiple.items():\n", " pairs[k] = sorted(vs)[0]\n", "for (t, v) in mapAddition.items():\n", " k = f'{t[0]}({t[1]})' if type(t) is tuple else t\n", " pairs[k] = v\n", "for (k, v) in MAPPING_SOLUTIONSX.items():\n", " pairs[k] = v\n", "for (k, v) in unique.items():\n", " pairs[k] = v\n", "\n", "with open(MAPPING_FILE, 'w') as mf:\n", " for (k,v) in sorted(pairs.items()):\n", " mf.write(f'{k}\\t{v}\\n')\n", "print(f'{len(pairs)} entries written to {MAPPING_FILE}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }