Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 9287bc8
  • /
  • santakku.ipynb
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:501abf9a9aa3326cae4a8595244f6bedb55c9108
directory badge Iframe embedding
swh:1:dir:9287bc84340ee2f1ec0e4d777e59e51908046a3d

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
santakku.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Santakku fonts and sign list\n",
    "\n",
    "## Provenance\n",
    "\n",
    "On the advice of Martin Worthington I visited the\n",
    "[Cambridge Cuneify+ page](http://www.hethport.uni-wuerzburg.de/cuneifont/).\n",
    "\n",
    "On that page there is a link to a \n",
    "[Würzburg page on Cuneiform fonts](http://www.hethport.uni-wuerzburg.de/cuneifont/)\n",
    "with a download link to\n",
    "[Old Babylonian Fonts](http://www.hethport.uni-wuerzburg.de/cuneifont/download/Santakku.zip),\n",
    "containing the Santakku(M) fonts and a sign list in PDF.\n",
    "\n",
    "I extracted the text from that PDF, sanitized it to one table cell per line by means of the text editor Vim,\n",
    "and that file is the source of this notebook, that tries to restore the orginal table in a tab separated format.\n",
    "\n",
    "The PDF is in the *docs* directory of this repo.\n",
    "\n",
    "The sanitized text file is the file *sources/writing/Santakku.txt* in this repository."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems\n",
    "\n",
    "While the text extraction and sanitizing went reasonably well, there are problems with empty cells.\n",
    "\n",
    "The table is seven columns wide, but there are not seven lines per row in the text file due to missing cells.\n",
    "\n",
    "Yet we can align by means of the typical values in the cells (unicode code points, characters, small numbers).\n",
    "\n",
    "Sometimes the values are also missing.\n",
    "\n",
    "We ignore the values in the Santakku columns and also the value, so we will not suffer much by this problem.\n",
    "\n",
    "## Results\n",
    "\n",
    "We just extract these columns:\n",
    "\n",
    "* `Unicode` i.e. unicode code point,\n",
    "* `signe` i.e. grapheme,\n",
    "* `Autotext` i.e. reading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "BASE = os.path.expanduser('~/github')\n",
    "ORG = 'Nino-cunei'\n",
    "REPO = 'oldbabylonian'\n",
    "\n",
    "REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n",
    "\n",
    "SRC = f'{REPO_DIR}/sources/writing/Santakku.txt'\n",
    "\n",
    "CUNEI_START = int('12000', 16)\n",
    "CUNEI_END = int('13000', 16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [],
   "source": [
    "uniCandRe = re.compile(r'''^\\s*([0-9A-Fa-f]{5}[ +]*)+$''')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ERROR at line 366: out of sync \"141 12100 GI\"\n",
      "['141 12100 GI', '140 12363', 'ZI', 'mud', 'JM', 'UY', 'MUD (ḪU-ḪI)']\n",
      "ERROR at line 367: missing Borger number \"141 12100 GI\"\n",
      "['141 12100 GI', '140 12363', 'ZI', 'mud', 'JM', 'UY', 'MUD (ḪU-ḪI)']\n",
      "Seen 365 lines\n",
      "ERROR detected\n"
     ]
    }
   ],
   "source": [
    "# code below not working because I do not yet correctly all unicode code point strings\n",
    "# correctly, eg \"140 12363\"\n",
    "\n",
    "def makeMapping():\n",
    "  mapping = {}\n",
    "  \n",
    "  def finishUni():\n",
    "    if curGrapheme is None:\n",
    "      print(f'ERROR at line {i + 1}: missing grapheme for uni \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    \n",
    "    curReading = None\n",
    "    for (p, pLine) in enumerate(reversed(prevLines)):\n",
    "      curBorger = None\n",
    "      if p == 0:\n",
    "        if not (pLine.isdigit() and not 0 < int(p) < 1000):\n",
    "          print(f'ERROR at line {i + 1}: missing Borger number \"{pLine}\"')\n",
    "          print(list(reversed(prevLines)))\n",
    "          return False\n",
    "          \n",
    "        curBorger = line\n",
    "      else:\n",
    "        curReading = line\n",
    "        return True\n",
    "      \n",
    "    if curReading is None:\n",
    "      print(f'ERROR at line {i + 1}: missing reading for uni \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    \n",
    "    uniStrs = curUni.strip().split()\n",
    "    for uniStr in uniStrs:\n",
    "      uniGood = True\n",
    "      try:\n",
    "        uni = int(uniStr, 16)\n",
    "      except Exception:\n",
    "        uniGood = False\n",
    "        break\n",
    "    if not uniGood:\n",
    "      print(f'ERROR at line {i + 1}: malformed unicode number \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    unis = {int(uniStr) for uniStr in uniStrs}\n",
    "    if len(unis) != len(uniStrs):\n",
    "      print(f'ERROR at line {i + 1}: identical unis in \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "        \n",
    "    for uniStr in uniStrs:\n",
    "      uniStr = uniStr.upper()\n",
    "      if uniStr in mapping:\n",
    "        print(f'ERROR at line {i + 1}: duplicate uni {uniStr} in \"{curUni}\"')\n",
    "        print(list(reversed(prevLines)))\n",
    "        return False\n",
    "\n",
    "      mapping[uniStr] = (curGrapheme, curReading)\n",
    "    return True\n",
    "            \n",
    "  with open(SRC) as fh:\n",
    "    curUni = None\n",
    "    curGrapheme = None\n",
    "    prevLines = []\n",
    "\n",
    "    i = 0\n",
    "    for line in fh:\n",
    "      i += 1\n",
    "      line = line.strip()\n",
    "      if uniCandRe.match(line):\n",
    "        if curUni:\n",
    "          if not finishUni():\n",
    "            break\n",
    "        curUni = line\n",
    "        curGrapheme = None\n",
    "        prevLines = []\n",
    "        continue\n",
    "\n",
    "      if len(prevLines) == 0:\n",
    "        curGrapheme = line\n",
    "        prevLines.append(line)\n",
    "        continue\n",
    "\n",
    "      prevLines.append(line)\n",
    "      if len(prevLines) > 6:\n",
    "        print(f'ERROR at line {i + 1}: out of sync \"{line}\"')\n",
    "        print(list(reversed(prevLines)))\n",
    "        break\n",
    "        \n",
    "    i += 1\n",
    "    good = finishUni()\n",
    "    print(f'Seen {i - 1} lines')\n",
    "\n",
    "  if good:\n",
    "    print(f'{len(mapping)} unicode characters mapped')\n",
    "  else:\n",
    "    print(f'ERROR detected')\n",
    "  return mapping\n",
    "    \n",
    "mapping = makeMapping()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "for (uni, (grapheme, reading)) in sorted(mapping.items()):\n",
    "  print(f'\"{chr(uni)}\" = {uni} = \"{grapheme}\" = \"{reading}\"')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API