Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/nino-cunei/oldbabylonian
10 April 2019, 10:45:51 UTC
  • Code
  • Branches (11)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/master
    • refs/tags/v0.2
    • refs/tags/v0.3
    • refs/tags/v0.4
    • refs/tags/v1.0
    • refs/tags/v1.0.1
    • refs/tags/v1.1
    • refs/tags/v1.2
    • refs/tags/v1.3
    • refs/tags/v1.4
    No releases to show
  • 4d43a1e
  • /
  • programs
  • /
  • santakku.ipynb
Raw File Download
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge Iframe embedding
swh:1:cnt:501abf9a9aa3326cae4a8595244f6bedb55c9108
origin badgedirectory badge Iframe embedding
swh:1:dir:93fcf99f59894f5f5ee4145dcf9225e7a4789f9f
origin badgerevision badge
swh:1:rev:160e4a3237e0978fe5fb2eb0bb3697ad037eabf6
origin badgesnapshot badge
swh:1:snp:9091ca8d749e2b01a10b40227cf5a226e41c8da5

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: 160e4a3237e0978fe5fb2eb0bb3697ad037eabf6 authored by Dirk Roorda on 15 March 2019, 09:32:56 UTC
metadata added; better unicode mapping
Tip revision: 160e4a3
santakku.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Santakku fonts and sign list\n",
    "\n",
    "## Provenance\n",
    "\n",
    "On the advice of Martin Worthington I visited the\n",
    "[Cambridge Cuneify+ page](http://www.hethport.uni-wuerzburg.de/cuneifont/).\n",
    "\n",
    "On that page there is a link to a \n",
    "[Würzburg page on Cuneiform fonts](http://www.hethport.uni-wuerzburg.de/cuneifont/)\n",
    "with a download link to\n",
    "[Old Babylonian Fonts](http://www.hethport.uni-wuerzburg.de/cuneifont/download/Santakku.zip),\n",
    "containing the Santakku(M) fonts and a sign list in PDF.\n",
    "\n",
    "I extracted the text from that PDF, sanitized it to one table cell per line by means of the text editor Vim,\n",
    "and that file is the source of this notebook, that tries to restore the orginal table in a tab separated format.\n",
    "\n",
    "The PDF is in the *docs* directory of this repo.\n",
    "\n",
    "The sanitized text file is the file *sources/writing/Santakku.txt* in this repository."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems\n",
    "\n",
    "While the text extraction and sanitizing went reasonably well, there are problems with empty cells.\n",
    "\n",
    "The table is seven columns wide, but there are not seven lines per row in the text file due to missing cells.\n",
    "\n",
    "Yet we can align by means of the typical values in the cells (unicode code points, characters, small numbers).\n",
    "\n",
    "Sometimes the values are also missing.\n",
    "\n",
    "We ignore the values in the Santakku columns and also the value, so we will not suffer much by this problem.\n",
    "\n",
    "## Results\n",
    "\n",
    "We just extract these columns:\n",
    "\n",
    "* `Unicode` i.e. unicode code point,\n",
    "* `signe` i.e. grapheme,\n",
    "* `Autotext` i.e. reading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "BASE = os.path.expanduser('~/github')\n",
    "ORG = 'Nino-cunei'\n",
    "REPO = 'oldbabylonian'\n",
    "\n",
    "REPO_DIR = f'{BASE}/{ORG}/{REPO}'\n",
    "\n",
    "SRC = f'{REPO_DIR}/sources/writing/Santakku.txt'\n",
    "\n",
    "CUNEI_START = int('12000', 16)\n",
    "CUNEI_END = int('13000', 16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [],
   "source": [
    "uniCandRe = re.compile(r'''^\\s*([0-9A-Fa-f]{5}[ +]*)+$''')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ERROR at line 366: out of sync \"141 12100 GI\"\n",
      "['141 12100 GI', '140 12363', 'ZI', 'mud', 'JM', 'UY', 'MUD (ḪU-ḪI)']\n",
      "ERROR at line 367: missing Borger number \"141 12100 GI\"\n",
      "['141 12100 GI', '140 12363', 'ZI', 'mud', 'JM', 'UY', 'MUD (ḪU-ḪI)']\n",
      "Seen 365 lines\n",
      "ERROR detected\n"
     ]
    }
   ],
   "source": [
    "# code below not working because I do not yet correctly all unicode code point strings\n",
    "# correctly, eg \"140 12363\"\n",
    "\n",
    "def makeMapping():\n",
    "  mapping = {}\n",
    "  \n",
    "  def finishUni():\n",
    "    if curGrapheme is None:\n",
    "      print(f'ERROR at line {i + 1}: missing grapheme for uni \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    \n",
    "    curReading = None\n",
    "    for (p, pLine) in enumerate(reversed(prevLines)):\n",
    "      curBorger = None\n",
    "      if p == 0:\n",
    "        if not (pLine.isdigit() and not 0 < int(p) < 1000):\n",
    "          print(f'ERROR at line {i + 1}: missing Borger number \"{pLine}\"')\n",
    "          print(list(reversed(prevLines)))\n",
    "          return False\n",
    "          \n",
    "        curBorger = line\n",
    "      else:\n",
    "        curReading = line\n",
    "        return True\n",
    "      \n",
    "    if curReading is None:\n",
    "      print(f'ERROR at line {i + 1}: missing reading for uni \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    \n",
    "    uniStrs = curUni.strip().split()\n",
    "    for uniStr in uniStrs:\n",
    "      uniGood = True\n",
    "      try:\n",
    "        uni = int(uniStr, 16)\n",
    "      except Exception:\n",
    "        uniGood = False\n",
    "        break\n",
    "    if not uniGood:\n",
    "      print(f'ERROR at line {i + 1}: malformed unicode number \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "    unis = {int(uniStr) for uniStr in uniStrs}\n",
    "    if len(unis) != len(uniStrs):\n",
    "      print(f'ERROR at line {i + 1}: identical unis in \"{curUni}\"')\n",
    "      print(list(reversed(prevLines)))\n",
    "      return False\n",
    "        \n",
    "    for uniStr in uniStrs:\n",
    "      uniStr = uniStr.upper()\n",
    "      if uniStr in mapping:\n",
    "        print(f'ERROR at line {i + 1}: duplicate uni {uniStr} in \"{curUni}\"')\n",
    "        print(list(reversed(prevLines)))\n",
    "        return False\n",
    "\n",
    "      mapping[uniStr] = (curGrapheme, curReading)\n",
    "    return True\n",
    "            \n",
    "  with open(SRC) as fh:\n",
    "    curUni = None\n",
    "    curGrapheme = None\n",
    "    prevLines = []\n",
    "\n",
    "    i = 0\n",
    "    for line in fh:\n",
    "      i += 1\n",
    "      line = line.strip()\n",
    "      if uniCandRe.match(line):\n",
    "        if curUni:\n",
    "          if not finishUni():\n",
    "            break\n",
    "        curUni = line\n",
    "        curGrapheme = None\n",
    "        prevLines = []\n",
    "        continue\n",
    "\n",
    "      if len(prevLines) == 0:\n",
    "        curGrapheme = line\n",
    "        prevLines.append(line)\n",
    "        continue\n",
    "\n",
    "      prevLines.append(line)\n",
    "      if len(prevLines) > 6:\n",
    "        print(f'ERROR at line {i + 1}: out of sync \"{line}\"')\n",
    "        print(list(reversed(prevLines)))\n",
    "        break\n",
    "        \n",
    "    i += 1\n",
    "    good = finishUni()\n",
    "    print(f'Seen {i - 1} lines')\n",
    "\n",
    "  if good:\n",
    "    print(f'{len(mapping)} unicode characters mapped')\n",
    "  else:\n",
    "    print(f'ERROR detected')\n",
    "  return mapping\n",
    "    \n",
    "mapping = makeMapping()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "for (uni, (grapheme, reading)) in sorted(mapping.items()):\n",
    "  print(f'\"{chr(uni)}\" = {uni} = \"{grapheme}\" = \"{reading}\"')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API