Revision 195bae1900a0df8df7661232c2e0eb1e471e1e73 authored by Dirk Roorda on 08 March 2019, 21:21:06 UTC, committed by Dirk Roorda on 08 March 2019, 21:21:06 UTC
1 parent 43f6566
mapReadings.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mapping Old Babylonian readings to Unicode\n",
"\n",
"## Task\n",
"\n",
"We want to map the *readings* in the Old Babylonian Corpus to unicode strings in the Cuneiform block,\n",
"based on extant mapping tables.\n",
"\n",
"## Problem\n",
"\n",
"There are multiple mapping tables, there are several ways to transliterate readings.\n",
"\n",
"## Sources\n",
"\n",
"We take the ATF transliterations from CDLI, for tablets found by a search on AbB and Old Babylonian.\n",
"\n",
"We take the file\n",
"[GeneratedSignList.json](https://github.com/Nino-cunei/oldbabylonian/blob/master/sources/writing/GeneratedSignList.json)\n",
"with mappings like\n",
"\n",
"```json\n",
" \"BANIA\": {\n",
" \"signName\": \"BANIA\",\n",
" \"signNumber\": 551,\n",
" \"signCunei\": \"๐\",\n",
" \"codePoint\": \"\",\n",
" \"values\":\n",
"\t\t\t[\n",
" \"BANIA\", \"Aล 2.UoverU\", \"5SลชTU\"\n",
" ]\n",
" },\n",
" \"MA\": {\n",
" \"signName\": \"MA\",\n",
" \"signNumber\": 552,\n",
" \"signCunei\": \"๐ \",\n",
" \"codePoint\": \"\",\n",
" \"values\":\n",
"\t\t\t[\n",
" \"MA\", \"PEล 3\", \"PEล ล E\", \"WA6\"\n",
" ]\n",
" },\n",
"```\n",
"\n",
"This file has been generated by \n",
"[Auday Hussein](https://www.linkedin.com/in/audayhussein/?originalSubdomain=ca).\n",
"> I generated this JSON file from the original source http://home.zcu.cz/~ksaskova/Sign_List.html\n",
"using a python script I wrote.\n",
"The original HTML list is created manually by Dr. Kateลina ล aลกkovรก from the University of West Bohemia,\n",
"therefore all credit should go to her.\n",
"\n",
"> *Auday Hussein in an email to Dirk Roorda*\n",
" \n",
"# Status\n",
"\n",
"This is work in progress. \n",
"The mapping is needed in the conversion from ATF to TF in the program\n",
"[tfFromATF.py](tfFromATF.py).\n",
"\n",
"We are still dealing with conversion glitches.\n",
"\n",
"We also need to resolve cases where readings map to multiple cuneiform strings by making corpus-specific choices\n",
"for Old Babylonian.\n",
"\n",
"# Authors\n",
"\n",
"Cale Johnson, Dirk Roorda\n",
"\n",
"# Acknowledgements\n",
"\n",
"We are indebted to Auday for helpfully sending *GeneratedSignList.json* file to us."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import collections\n",
"import re\n",
"import json\n",
"\n",
"from tf.fabric import Fabric"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local topography"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"BASE = os.path.expanduser('~/github')\n",
"ORG = 'Nino-cunei'\n",
"REPO = 'oldbabylonian'\n",
"VERSION = '0.3'\n",
"\n",
"REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n",
"\n",
"TF_DIR = f'{REPO_DIR}/tf/{VERSION}'\n",
"WRITING_DIR = f'{REPO_DIR}/sources/writing'\n",
"\n",
"SIGN_FILE = 'GeneratedSignList.json'\n",
"SIGN_PATH = f'{WRITING_DIR}/{SIGN_FILE}'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading collection\n",
"\n",
"We use TF to collect all readings from the corpus in a set."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is Text-Fabric 7.4.5\n",
"Api reference : https://annotation.github.io/text-fabric/Api/Fabric/\n",
"\n",
"27 features found and 0 ignored\n",
" 0.00s loading features ...\n",
" | 0.00s B otype from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.09s B oslots from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B pnumber from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.05s B type from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.03s B after from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.05s B atf from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.01s B grapheme from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.05s B reading from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.03s B uafter from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.07s B unicode from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B collated from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B combined from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B comment from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B damage from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B exclamation from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B fraction from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B givengrapheme from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.03s B language from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.01s B ln from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B repeat from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.01s B srcfile from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.02s B srcline from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.01s B srcln from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B subtype from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B super from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" | 0.00s B uncertain from /Users/dirk/github/Nino-cunei/oldbabylonian/tf/0.3\n",
" 0.74s All features loaded/computed - for details use loadLog()\n"
]
},
{
"data": {
"text/plain": [
"[('Computed',\n",
" 'computed-data',\n",
" ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),\n",
" ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),\n",
" ('Fabric', 'loading', ('ensureLoaded', 'TF', 'ignored', 'loadLog')),\n",
" ('Locality', 'locality', ('L Locality',)),\n",
" ('Misc', 'messaging', ('cache', 'error', 'indent', 'info', 'reset')),\n",
" ('Nodes',\n",
" 'navigating-nodes',\n",
" ('N Nodes', 'sortKey', 'sortKeyTuple', 'otypeRank', 'sortNodes')),\n",
" ('Features',\n",
" 'node-features',\n",
" ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),\n",
" ('Search', 'search', ('S Search',)),\n",
" ('Text', 'text', ('T Text',))]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"TF = Fabric(TF_DIR)\n",
"allFeatures = TF.explore(silent=True, show=True)\n",
"loadableFeatures = allFeatures['nodes'] + allFeatures['edges']\n",
"api = TF.load(loadableFeatures, silent=False)\n",
"api.makeAvailableIn(globals())"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"745 different readings in corpus\n"
]
}
],
"source": [
"readings = set(F.reading.v(s) for s in F.otype.s('sign')) - {None}\n",
"print(f'{len(readings)} different readings in corpus')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unicode style versus ATF style\n",
"\n",
"We use mappings between Unicode style transliterations and ATF."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"transUni = {\n",
" 'sz': 'ลก',\n",
" 's,': 'แนฃ',\n",
" \"s'\": 'ล',\n",
" 't,': 'แนญ',\n",
" 'h,': 'แธซ',\n",
"}\n",
"\n",
"transAscii = {rout.upper(): rin for (rin, rout) in transUni.items()}\n",
"\n",
"def makeUni(r):\n",
" for (rin, rout) in transUni.items():\n",
" r = r.replace(rin, rout)\n",
" return (\n",
" r.\\\n",
" replace(\"'\", '').\\\n",
" replace('{', '').\\\n",
" replace('}', '').\\\n",
" replace('.', '').\\\n",
" replace(':', '')\n",
" )\n",
"\n",
"def makeAscii(r):\n",
" for (rin, rout) in transAscii.items():\n",
" r = r.replace(rin, rout)\n",
" return r.lower()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read the sign list\n",
"\n",
"We read the json file with generated signs.\n",
"\n",
"For each sign, we find a list of *values*.\n",
"\n",
"These values correspond to possible readings. \n",
"They are in unicode transliteration style.\n",
"\n",
"In the mapping we create, we use convert them to plain ATF,\n",
"which makes it easier to look them up from our Old Babylonian corpus."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1768 signs in the json file\n",
"8765 distinct values in table\n"
]
}
],
"source": [
"with open(SIGN_PATH) as fh:\n",
" signs = json.load(fh)['signs']\n",
"\n",
"print(f'{len(signs)} signs in the json file')\n",
"\n",
"mapping = collections.defaultdict(set)\n",
"\n",
"for (sign, signData) in signs.items():\n",
" uniStr = signData['signCunei']\n",
" values = signData['values']\n",
" for value in values:\n",
" valueAscii = makeAscii(value)\n",
" mapping[valueAscii].add(uniStr)\n",
"\n",
"print(f'{len(mapping)} distinct values in table')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading lookup\n",
"\n",
"We look up each Old Babylonian reading in the mapping just constructed.\n",
"\n",
"Depending on whether we find 0, 1 or multiple values, we store them in dictionaries\n",
"`unmapped`, `unique`, `multiple`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 80 unmapped readings\n",
" 41 ambiguously mapped readings\n",
"624 uniquely mapped readings\n"
]
}
],
"source": [
"unmapped = set()\n",
"unique = {}\n",
"multiple = {}\n",
"\n",
"for r in readings:\n",
" if r not in mapping:\n",
" unmapped.add(r)\n",
" continue\n",
" targets = mapping[r]\n",
" if len(targets) == 1:\n",
" unique[r] = list(targets)[0]\n",
" else:\n",
" multiple[r] = targets\n",
" \n",
"print(f'{len(unmapped):>3} unmapped readings')\n",
"print(f'{len(multiple):>3} ambiguously mapped readings')\n",
"print(f'{len(unique):>3} uniquely mapped readings')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unmapped readings"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 80 unmapped readings\n"
]
},
{
"data": {
"text/plain": [
"['...szu',\n",
" '...x',\n",
" 'ah',\n",
" 'alamusz',\n",
" 'asal2',\n",
" 'babila2',\n",
" 'barig',\n",
" \"bur'u\",\n",
" 'd',\n",
" 'dah',\n",
" 'di...',\n",
" 'duh',\n",
" 'e2ni',\n",
" 'eh',\n",
" 'eri11',\n",
" 'gazx',\n",
" \"gesz'u\",\n",
" 'geszimmar',\n",
" 'geszu',\n",
" 'gudu4',\n",
" 'ha',\n",
" 'ha:a',\n",
" 'had2',\n",
" 'hal',\n",
" 'har',\n",
" 'he',\n",
" 'he2',\n",
" 'hi',\n",
" 'hu',\n",
" 'hub2',\n",
" 'hun',\n",
" 'hur',\n",
" 'ih',\n",
" 'inana',\n",
" 'isx',\n",
" 'isztar',\n",
" 'itu',\n",
" 'kislah',\n",
" 'kux',\n",
" 'lah',\n",
" 'lah4',\n",
" 'lah5',\n",
" 'lah6',\n",
" 'lal3',\n",
" 'm',\n",
" \"ma'\",\n",
" 'mah',\n",
" 'muhaldim',\n",
" 'n',\n",
" 'nigar',\n",
" 'nirah',\n",
" 'p',\n",
" 'pesz2',\n",
" 'sa10',\n",
" 'sahar',\n",
" 'siskur2',\n",
" 'sza3}',\n",
" 'szagina',\n",
" 'szah',\n",
" 'szah2',\n",
" 'szandana',\n",
" 'sze9',\n",
" 'szii',\n",
" 'szunigin',\n",
" 'tah',\n",
" 'tap',\n",
" 'udru',\n",
" 'uh',\n",
" 'uh2',\n",
" 'uh3',\n",
" 'ukken',\n",
" 'unu',\n",
" 'ura',\n",
" 'ururdu',\n",
" 'utumu',\n",
" 'x...',\n",
" 'xxxx',\n",
" '{a',\n",
" '{diszszu',\n",
" '{ki']"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(f'{len(unmapped):>3} unmapped readings')\n",
"sorted(unmapped)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ambiguously mapped readings"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 41 ambiguously mapped readings\n",
"ba4 => (3) => ๐๐ญ๐ท - ๐ท - ๐๐ท๐ท\n",
"ba6 => (2) => ๐๐ - ๐\n",
"bara2 => (2) => ๐ - ๐\n",
"buru14 => (2) => ๐ - ๐\n",
"dabin => (2) => ๐ ๐บ - ๐ฅ๐บ\n",
"dilmun => (3) => ๐๐ - ๐ฉ๐ธ - ๐ฉ๐\n",
"eri => (2) => ๐
- ๐ท\n",
"erisz => (2) => ๐ฉ๐ - ๐ฉ๐\n",
"gala => (3) => ๐ฒ - ๐๐ช - ๐\n",
"gin7 => (2) => ๐ถ - ๐\n",
"gurusz => (2) => ๐จ - ๐\n",
"ia => (2) => ๐
- ๐ฟ\n",
"idim => (2) => ๐ - ๐
\n",
"ii => (2) => ๐
- ๐ฟ\n",
"il => (2) => ๐ง - ๐
\n",
"iri => (2) => ๐
- ๐ท\n",
"isz8 => (2) => ๐น - ๐\n",
"iu => (2) => ๐
- ๐ฟ\n",
"kam => (2) => ๐ญ๐ - ๐ฐ\n",
"kesz2 => (2) => ๐ก - ๐\n",
"kesz3 => (2) => ๐๐ญ๐ฒ - ๐๐ญ๐ฒ๐ \n",
"lum => (2) => ๐ - ๐\n",
"munu4 => (2) => ๐ฝ๐ฝ - ๐ฝ๐บ๐ฝ\n",
"ne3 => (2) => ๐ - ๐\n",
"nergal => (2) => ๐๐๐ฒ - ๐๐๐ฒ\n",
"pa2 => (2) => ๐ - ๐๐\n",
"pirig => (2) => ๐ - ๐\n",
"puzur4 => (2) => ๐
ค๐ญ - ๐๐ญ\n",
"sig17 => (2) => ๐ - ๐ฌ\n",
"sze20 => (2) => ๐ - ๐
\n",
"t,a2 => (2) => ๐ซ - ๐ฌ\n",
"til => (2) => ๐ - ๐\n",
"us => (2) => ๐ป - ๐\n",
"usa => (2) => ๐ - ๐\n",
"usz => (2) => ๐ - ๐\n",
"usz2 => (2) => ๐ - ๐\n",
"uz => (2) => ๐ป - ๐\n",
"wa => (2) => ๐ - ๐ฟ\n",
"wa2 => (2) => ๐ - ๐\n",
"zi3 => (2) => ๐ - ๐ฅ\n",
"ziz2 => (2) => ๐พ - ๐ฉ\n"
]
}
],
"source": [
"print(f'{len(multiple):>3} ambiguously mapped readings')\n",
"for r in sorted(multiple):\n",
" unis = multiple[r]\n",
" uniStr = ' - '.join(sorted(unis))\n",
" print(f'{r} => ({len(unis)}) => {uniStr}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Uniquely mapped readings"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"624 uniquely mapped readings\n",
" a => ๐\n",
" a2 => ๐\n",
" ab => ๐\n",
" ab2 => ๐\n",
" abul => ๐๐ฒ\n",
" abzu => ๐ช๐\n",
" ad => ๐\n",
" ag => ๐\n",
" ag2 => ๐\n",
" aga => ๐\n",
" agrig => ๐
๐พ\n",
" ak => ๐\n",
" akszak => ๐\n",
" al => ๐ \n",
" alan => ๐ฉ\n",
" am => ๐ \n",
" am3 => ๐๐ญ\n",
" ama => ๐ผ\n",
" amar => ๐ซ\n",
" an => ๐ญ\n",
" ansze => ๐\n",
" ap => ๐\n",
" apin => ๐ณ\n",
" aq => ๐\n",
" ar => ๐
\n",
" ar3 => ๐ฏ\n",
" as => ๐\n",
" as, => ๐\n",
" as2 => ๐พ\n",
" asal => ๐\n",
" asar => ๐\n",
" asz => ๐ธ\n",
" asz2 => ๐พ\n",
" asza5 => ๐ท\n",
" aszgab => ๐ฟ\n",
" asznan => ๐บ๐\n",
" at => ๐\n",
" at, => ๐\n",
" az => ๐\n",
" az2 => ๐พ\n",
" az3 => ๐ธ\n",
" azlag2 => ๐\n",
" ba => ๐\n",
" babbar => ๐\n",
" bad3 => ๐ฆ\n",
" bal => ๐\n",
" bala => ๐\n",
" ban => ๐ผ\n",
" ban2 => ๐\n",
" ban3 => ๐\n",
" banda3 => ๐\n",
" banesz => ๐\n",
" banszur => ๐\n",
" bappir => ๐\n",
" bar => ๐\n",
" bat => ๐\n",
" be => ๐\n",
" be2 => ๐\n",
" bi => ๐\n",
" bi2 => ๐\n",
" bil => ๐\n",
" bil2 => ๐\n",
" bir2 => ๐\n",
" bir4 => ๐\n",
" bisz => ๐ซ\n",
" bu => ๐\n",
" bun2 => ๐
ฎ\n",
" bur => ๐\n",
" bur3 => ๐\n",
" buranun => ๐๐๐ฃ\n",
" da => ๐\n",
" dab => ๐ณ\n",
" dab5 => ๐ช\n",
" dag => ๐\n",
" dagal => ๐ผ\n",
" dam => ๐ฎ\n",
" dan => ๐จ\n",
" dar => ๐ฏ\n",
" de => ๐ฒ\n",
" de3 => ๐\n",
" de4 => ๐ผ\n",
" di => ๐ฒ\n",
" di2 => ๐น\n",
" di3 => ๐พ\n",
" dib => ๐ณ\n",
" dida => ๐๐๐\n",
" didli => ๐\n",
" dil => ๐ธ\n",
" dim => ๐ด\n",
" dim2 => ๐ถ\n",
" dim4 => ๐ฝ๐ฝ\n",
" din => ๐ท\n",
" dingir => ๐ญ\n",
" diri => ๐๐\n",
" dirig => ๐๐\n",
" disz => ๐น\n",
" du => ๐บ\n",
" du10 => ๐ญ\n",
" du11 => ๐
\n",
" du3 => ๐\n",
" du5 => ๐
\n",
" du6 => ๐ฏ\n",
" du7 => ๐\n",
" du8 => ๐\n",
" dub => ๐พ\n",
" dug => ๐\n",
" dug3 => ๐ญ\n",
" dul3 => ๐จ\n",
" dumu => ๐\n",
" duru5 => ๐\n",
" dusu => ๐
\n",
" dusu2 => ๐ฒ๐
\n",
" e => ๐\n",
" e2 => ๐\n",
" e3 => ๐๐บ\n",
" ea => ๐\n",
" eb => ๐
\n",
" ed => ๐\n",
" edin => ๐\n",
" eg => ๐
\n",
" egir => ๐\n",
" ek => ๐
\n",
" el => ๐\n",
" el2 => ๐
\n",
" el3 => ๐ญ\n",
" elam => ๐\n",
" em => ๐
\n",
" eme => ๐
ด\n",
" en => ๐\n",
" en6 => ๐
\n",
" engar => ๐ณ\n",
" enku => ๐ ๐ฉ\n",
" ensi2 => ๐๐ผ๐\n",
" ep => ๐
\n",
" eq => ๐
\n",
" er => ๐
\n",
" er2 => ๐๐
\n",
" er3 => ๐ด\n",
" eren2 => ๐\n",
" eresz2 => ๐\n",
" erim => ๐\n",
" erin => ๐\n",
" erin2 => ๐\n",
" es => ๐\n",
" es, => ๐\n",
" esir => ๐๐\n",
" esz => ๐\n",
" esz15 => ๐
\n",
" esz18 => ๐น\n",
" esz2 => ๐ \n",
" esz3 => ๐\n",
" esza => ๐๐\n",
" esze3 => ๐\n",
" et => ๐\n",
" et, => ๐\n",
" ez => ๐\n",
" ezem => ๐ก\n",
" ga => ๐ต\n",
" ga2 => ๐ท\n",
" gab => ๐ฎ\n",
" gaba => ๐ฎ\n",
" gada => ๐ฐ\n",
" gag => ๐\n",
" gal => ๐ฒ\n",
" gal2 => ๐
\n",
" gan => ๐ถ\n",
" gan2 => ๐ท\n",
" ganba => ๐ ๐ด\n",
" gar => ๐ป\n",
" gar3 => ๐ผ\n",
" gaz => ๐ค\n",
" ge => ๐\n",
" ge6 => ๐ช\n",
" geme => ๐ฉ\n",
" geme2 => ๐ฉ๐ณ\n",
" gesz => ๐\n",
" gesz2 => ๐น\n",
" gesztin => ๐พ\n",
" gesztu2 => ๐๐๐ฟ\n",
" gi => ๐\n",
" gi2 => ๐ค\n",
" gi4 => ๐\n",
" gi7 => ๐ \n",
" gibil => ๐\n",
" gid2 => ๐\n",
" gidri => ๐บ\n",
" gigir => ๐\n",
" gim => ๐ถ\n",
" gin => ๐บ\n",
" gin2 => ๐
\n",
" gir => ๐ซ\n",
" gir14 => ๐ฉ\n",
" gir2 => ๐\n",
" gir3 => ๐\n",
" gir8 => ๐ธ\n",
" giri17 => ๐
\n",
" giri3 => ๐\n",
" gissu => ๐๐ช\n",
" gisz => ๐\n",
" gu => ๐\n",
" gu2 => ๐\n",
" gu4 => ๐\n",
" gu7 => ๐
ฅ\n",
" gub => ๐บ\n",
" gud => ๐\n",
" gul => ๐ข\n",
" gum2 => ๐\n",
" gur => ๐ฅ\n",
" gur10 => ๐ฅ\n",
" gur11 => ๐ต\n",
" gur8 => ๐ฝ\n",
" guru7 => ๐ฆ\n",
" i => ๐ฟ\n",
" i3 => ๐\n",
" i7 => ๐๐\n",
" ia2 => ๐\n",
" ia3 => ๐\n",
" ib => ๐
\n",
" ib2 => ๐\n",
" ibila => ๐๐\n",
" id => ๐\n",
" id2 => ๐๐\n",
" idigna => ๐ฆ๐๐ผ\n",
" ig => ๐
\n",
" igi => ๐
\n",
" ik => ๐
\n",
" iku => ๐ท\n",
" il2 => ๐
\n",
" il3 => ๐ญ\n",
" il5 => ๐\n",
" illat => ๐๐ณ\n",
" im => ๐
\n",
" imin => ๐\n",
" imma3 => ๐
\n",
" in => ๐
\n",
" ina => ๐ธ\n",
" inanna => ๐น\n",
" inim => ๐
\n",
" ip => ๐
\n",
" iq => ๐
\n",
" ir => ๐
\n",
" ir3 => ๐ด\n",
" is => ๐\n",
" is, => ๐\n",
" is2 => ๐
\n",
" is3 => ๐\n",
" is4 => ๐พ\n",
" isz => ๐
\n",
" isz3 => ๐\n",
" isz7 => ๐\n",
" iszkur => ๐
\n",
" isztaran => ๐
๐ฒ\n",
" it => ๐\n",
" it, => ๐\n",
" iti => ๐\n",
" iz => ๐\n",
" ka => ๐
\n",
" ka2 => ๐\n",
" ka3 => ๐ต\n",
" ka9 => ๐\n",
" kab => ๐\n",
" kak => ๐\n",
" kal => ๐จ\n",
" kal2 => ๐ฒ\n",
" kalag => ๐จ\n",
" kalam => ๐ฆ\n",
" kap => ๐\n",
" kar => ๐ผ๐\n",
" kar2 => ๐ธ\n",
" kar3 => ๐ผ\n",
" kas4 => ๐ฝ\n",
" kaskal => ๐\n",
" kasz => ๐\n",
" ke => ๐ \n",
" ki => ๐ \n",
" ki2 => ๐\n",
" kib => ๐\n",
" kid => ๐ค\n",
" kikken2 => ๐ฏ๐ฏ\n",
" kilib => ๐ธ\n",
" kin => ๐ฅ\n",
" kir => ๐ซ\n",
" kiri6 => ๐ฌ\n",
" kisz => ๐ง\n",
" kiszib => ๐ฉ\n",
" kiszib3 => ๐พ\n",
" ku => ๐ช\n",
" ku13 => ๐ฃ\n",
" ku3 => ๐ฌ\n",
" ku4 => ๐ฎ\n",
" ku5 => ๐ป\n",
" ku6 => ๐ฉ\n",
" kud => ๐ป\n",
" kum => ๐ฃ\n",
" kun => ๐ฒ\n",
" kup4 => ๐ค\n",
" kur => ๐ณ\n",
" kur2 => ๐ฝ\n",
" kurun2 => ๐ท\n",
" kuruszda => ๐ฏ\n",
" kusz => ๐ข\n",
" kusz3 => ๐\n",
" la => ๐ท\n",
" la2 => ๐ฒ\n",
" lagasz => ๐๐๐ท\n",
" lam => ๐ด\n",
" lamma => ๐จ\n",
" larsa => ๐๐\n",
" le => ๐ท\n",
" lem => ๐
\n",
" li => ๐ท\n",
" li2 => ๐\n",
" li3 => ๐
\n",
" libir => ๐
๐ \n",
" lik => ๐จ\n",
" lil2 => ๐ค\n",
" lim => ๐
\n",
" lu => ๐ป\n",
" lu2 => ๐ฝ\n",
" lu4 => ๐\n",
" lugal => ๐\n",
" lukur => ๐ฉ๐จ\n",
" ma => ๐ \n",
" ma2 => ๐ฃ\n",
" mal => ๐ท\n",
" man => ๐๐\n",
" mar => ๐ฅ\n",
" mar2 => ๐ซ\n",
" marduk => ๐ซ๐\n",
" masz => ๐ฆ\n",
" masz2 => ๐ง\n",
" maszkim => ๐๐ฝ\n",
" me => ๐จ\n",
" me2 => ๐ช\n",
" mesz => ๐จ๐\n",
" mi => ๐ช\n",
" mi2 => ๐ฉ\n",
" mi3 => ๐จ\n",
" mil => ๐
\n",
" mu => ๐ฌ\n",
" mug => ๐ฎ\n",
" mun => ๐ต\n",
" munus => ๐ฉ\n",
" mur => ๐ฏ\n",
" musz => ๐ฒ\n",
" musz5 => ๐\n",
" muszen => ๐ท\n",
" na => ๐พ\n",
" na4 => ๐๐\n",
" nag2 => ๐\n",
" nagar => ๐\n",
" nagga => ๐ญ๐พ\n",
" nam => ๐\n",
" nanna => ๐ถ๐ \n",
" nansze => ๐\n",
" nar => ๐\n",
" ne => ๐\n",
" ne2 => ๐\n",
" ni => ๐\n",
" nibru => ๐๐ค\n",
" nidba2 => ๐ป๐น\n",
" nig2 => ๐ป\n",
" nigin6 => ๐\n",
" nim => ๐\n",
" nimgir => ๐\n",
" nin => ๐ฉ๐\n",
" nina => ๐\n",
" ninda => ๐ป\n",
" nir => ๐ช\n",
" nita => ๐\n",
" nita2 => ๐ด\n",
" nu => ๐ก\n",
" nu2 => ๐ฟ\n",
" num => ๐\n",
" numun => ๐ฐ\n",
" nun => ๐ฃ\n",
" pa => ๐บ\n",
" pa12 => ๐ฟ\n",
" pa4 => ๐ฝ\n",
" pa5 => ๐ฝ๐\n",
" pal => ๐\n",
" par2 => ๐\n",
" pe => ๐ฟ\n",
" pe2 => ๐\n",
" pesz => ๐ซ\n",
" pi => ๐ฟ\n",
" pi2 => ๐\n",
" pi4 => ๐
\n",
" pil => ๐\n",
" pil2 => ๐\n",
" pir => ๐\n",
" pisan => ๐ท\n",
" pisz => ๐ซ\n",
" pu => ๐\n",
" pur => ๐\n",
" qa => ๐ก\n",
" qa2 => ๐ต\n",
" qa3 => ๐
\n",
" qal4 => ๐จ\n",
" qar => ๐ผ\n",
" qar3 => ๐ป\n",
" qe => ๐ฅ\n",
" qe2 => ๐ \n",
" qe3 => ๐\n",
" qi2 => ๐ \n",
" qi3 => ๐\n",
" qi4 => ๐\n",
" qir => ๐ซ\n",
" qu => ๐ฃ\n",
" qu2 => ๐ช\n",
" qu3 => ๐\n",
" qum => ๐ฃ\n",
" qur2 => ๐ณ\n",
" ra => ๐\n",
" ra2 => ๐บ\n",
" rasz => ๐\n",
" re => ๐\n",
" ri => ๐\n",
" ri2 => ๐ท\n",
" rim5 => ๐ธ\n",
" ru => ๐\n",
" ru3 => ๐ธ\n",
" rum => ๐ธ\n",
" s,a => ๐\n",
" s,ar => ๐ก\n",
" s,e => ๐ข\n",
" s,e2 => ๐ฃ\n",
" s,i => ๐ข\n",
" s,i2 => ๐ฃ\n",
" s,il2 => ๐ช\n",
" s,ir => ๐ฒ\n",
" s,u => ๐ฎ\n",
" s,u2 => ๐ช\n",
" s,um => ๐ฎ\n",
" s,ur => ๐ซ\n",
" sa => ๐\n",
" sa12 => ๐\n",
" sa2 => ๐ฒ\n",
" sa3 => ๐\n",
" sa6 => ๐ท\n",
" sag => ๐\n",
" sag11 => ๐ฅ\n",
" saga => ๐\n",
" sak => ๐\n",
" sal => ๐ฉ\n",
" sanga => ๐\n",
" sar => ๐ฌ\n",
" se => ๐\n",
" se2 => ๐ฃ\n",
" se3 => ๐ง\n",
" si => ๐\n",
" si2 => ๐ฃ\n",
" si4 => ๐\n",
" sig => ๐\n",
" sig2 => ๐ \n",
" sig4 => ๐\n",
" siki => ๐ \n",
" sikil => ๐\n",
" sila => ๐ป\n",
" sila3 => ๐ก\n",
" sila4 => ๐ข\n",
" silig => ๐\n",
" silim => ๐ฒ\n",
" sim => ๐\n",
" simug => ๐ฃ\n",
" sin => ๐\n",
" sin2 => ๐\n",
" sipa => ๐บ๐ป\n",
" sipad => ๐บ๐ป\n",
" sir2 => ๐\n",
" su => ๐ข\n",
" su2 => ๐ช\n",
" su3 => ๐ค\n",
" su7 => ๐ญ\n",
" suen => ๐๐ช\n",
" sukkal => ๐\n",
" sum => ๐ง\n",
" sum2 => ๐ฎ\n",
" sun2 => ๐ข\n",
" sur => ๐ฉ\n",
" sza => ๐ญ\n",
" sza13 => ๐น\n",
" sza3 => ๐ฎ\n",
" szabra => ๐๐ \n",
" szakkan2 => ๐\n",
" szam => ๐\n",
" szar => ๐ฌ\n",
" szar2 => ๐น\n",
" szara2 => ๐\n",
" sze => ๐บ\n",
" sze3 => ๐ \n",
" szen => ๐ฟ\n",
" szesz => ๐\n",
" szi => ๐
\n",
" szi2 => ๐\n",
" szim => ๐\n",
" szinig => ๐\n",
" szitim => ๐ถ\n",
" szu => ๐\n",
" szub => ๐\n",
" szubur => ๐\n",
" szuku => ๐ป\n",
" szul => ๐\n",
" szum => ๐ณ\n",
" szum2 => ๐ง\n",
" szur => ๐ฉ\n",
" szur4 => ๐ณ๐ฌ\n",
" szusz3 => ๐
\n",
" t,a => ๐\n",
" t,a3 => ๐ญ\n",
" t,am => ๐ฎ\n",
" t,ar => ๐ป\n",
" t,e => ๐ฒ\n",
" t,e4 => ๐ผ\n",
" t,e6 => ๐พ\n",
" t,i => ๐ฒ\n",
" t,i3 => ๐พ\n",
" t,u => ๐
\n",
" t,u2 => ๐
\n",
" t,u3 => ๐บ\n",
" t,ul => ๐ฅ\n",
" t,um => ๐\n",
" t,up => ๐พ\n",
" ta => ๐ซ\n",
" ta2 => ๐\n",
" tab => ๐\n",
" tak => ๐ณ\n",
" tak2 => ๐\n",
" tak4 => ๐บ\n",
" tal => ๐\n",
" tam => ๐\n",
" tam2 => ๐ฎ\n",
" tar => ๐ป\n",
" tar2 => ๐ฏ\n",
" taskarin => ๐\n",
" tasz => ๐จ\n",
" te => ๐ผ\n",
" te4 => ๐\n",
" te9 => ๐พ\n",
" tel => ๐\n",
" ter => ๐\n",
" ti => ๐พ\n",
" ti7 => ๐ผ\n",
" tibira => ๐๐\n",
" tim => ๐ด\n",
" tir => ๐\n",
" tiszpak => ๐ฝ\n",
" tu => ๐
\n",
" tu2 => ๐\n",
" tu3 => ๐บ\n",
" tug2 => ๐\n",
" tukul => ๐ช\n",
" tul => ๐๐\n",
" tul2 => ๐ฅ\n",
" tum => ๐\n",
" tun3 => ๐
\n",
" tup => ๐พ\n",
" tur => ๐\n",
" tur2 => ๐\n",
" u => ๐\n",
" u2 => ๐\n",
" u3 => ๐
\n",
" u4 => ๐\n",
" u8 => ๐\n",
" ub => ๐\n",
" ud => ๐\n",
" ud5 => ๐\n",
" udu => ๐ป\n",
" ug => ๐\n",
" ug3 => ๐ฆ\n",
" ugula => ๐บ\n",
" uk => ๐\n",
" ul => ๐\n",
" um => ๐\n",
" umbin => ๐ข\n",
" umma => ๐๐ต\n",
" un => ๐ฆ\n",
" unken => ๐บ\n",
" unug => ๐\n",
" up => ๐\n",
" uq => ๐\n",
" ur => ๐จ\n",
" ur2 => ๐ซ\n",
" ur3 => ๐ก\n",
" ur5 => ๐ฏ\n",
" urasz => ๐
\n",
" uri2 => ๐๐\n",
" urta => ๐
\n",
" uru => ๐ท\n",
" uru4 => ๐ณ\n",
" uruda => ๐\n",
" urudu => ๐\n",
" us, => ๐ป\n",
" us,2 => ๐\n",
" us,4 => ๐\n",
" us2 => ๐\n",
" usan3 => ๐ฎ\n",
" ut => ๐\n",
" ut, => ๐\n",
" utu => ๐\n",
" uz2 => ๐\n",
" uzu => ๐\n",
" we => ๐ฟ\n",
" wi => ๐ฟ\n",
" wu => ๐ฟ\n",
" yi => ๐ฟ\n",
" za => ๐\n",
" za3 => ๐ \n",
" zabala4 => ๐น๐๐\n",
" zabar => ๐๐
๐ฆ\n",
" zadim => ๐ฏ\n",
" zal => ๐\n",
" zalag2 => ๐\n",
" zar => ๐ก\n",
" ze => ๐ฃ\n",
" ze2 => ๐ข\n",
" zi => ๐ฃ\n",
" zi2 => ๐ข\n",
" zid2 => ๐ \n",
" zimbir => ๐๐๐ฃ\n",
" zir3 => ๐ฒ\n",
" zu => ๐ช\n",
" zu2 => ๐
\n",
" zum => ๐ฎ\n"
]
}
],
"source": [
"print(f'{len(unique):>3} uniquely mapped readings')\n",
"for r in sorted(unique):\n",
" print(f'{r:>10} => {unique[r]}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Computing file changes ...