checks.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Checks\n",
"Various checks on the correctness of the transformation from ascii transcriptions to a text-fabric data set.\n",
"\n",
"The\n",
"[diagnostics](https://github.com/Dans-labs/Nino-cunei/blob/master/reports/diagnostics.tsv)\n",
"of the transformation contains valueable issues that may be used to correct mistakes in the sources.\n",
"Or, equally likely, they correspond to misunderstandings on my (Dirk's) part of the model\n",
"that underlies the transcriptions.\n",
"\n",
"We will perform *grep* commands on the source files, and we will traverse node in Text-Fabric and collect information.\n",
"\n",
"Then we compare these sets of information.\n",
"\n",
"# Docs\n",
"\n",
"There is some documentation about the checking software itself:\n",
"\n",
"[Utils API](https://github.com/Nino-cunei/uruk/blob/master/docs/utils.md)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-11T09:54:49.437192Z",
"start_time": "2018-05-11T09:54:49.411738Z"
}
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-11T09:54:50.313805Z",
"start_time": "2018-05-11T09:54:50.245114Z"
}
},
"outputs": [],
"source": [
"import sys, os, collections, re\n",
"\n",
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-11T09:54:52.709767Z",
"start_time": "2018-05-11T09:54:51.545920Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code\n",
"Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github\n"
]
},
{
"data": {
"text/html": [
"<b>Documentation:</b> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/\" title=\"provenance of Old Babylonian Letters 1900-1600: Cuneiform tablets \">OLDBABYLONIAN</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs/transcription.md\" title=\"How TF features represent ATF\">Character table</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"OLDBABYLONIAN feature documentation\">Feature docs</a> <a target=\"_blank\" href=\"https://github.com/annotation/app-oldbabylonian\" title=\"oldbabylonian API documentation\">oldbabylonian API</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/\" title=\"text-fabric-api\">Text-Fabric API 7.5.1</a> <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Use/Search/\" title=\"Search Templates Introduction and Reference\">Search Reference</a><details open><summary><b>Loaded features</b>:</summary>\n",
"<p><b>Old Babylonian Letters 1900-1600: Cuneiform tablets </b>: <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ARK.tf\">ARK</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/after.tf\">after</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afterr.tf\">afterr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/afteru.tf\">afteru</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atf.tf\">atf</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpost.tf\">atfpost</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/atfpre.tf\">atfpre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/author.tf\">author</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/col.tf\">col</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collated.tf\">collated</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/collection.tf\">collection</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/comment.tf\">comment</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/damage.tf\">damage</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/det.tf\">det</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnote.tf\">docnote</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/docnumber.tf\">docnumber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excavation.tf\">excavation</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/excised.tf\">excised</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/face.tf\">face</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/flags.tf\">flags</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/fraction.tf\">fraction</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/genre.tf\">genre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/grapheme.tf\">grapheme</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemer.tf\">graphemer</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/graphemeu.tf\">graphemeu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lang.tf\">lang</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/langalt.tf\">langalt</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/ln.tf\">ln</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnc.tf\">lnc</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/lnno.tf\">lnno</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/material.tf\">material</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/missing.tf\">missing</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumcode.tf\">museumcode</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/museumname.tf\">museumname</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/object.tf\">object</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operator.tf\">operator</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatorr.tf\">operatorr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/operatoru.tf\">operatoru</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/otype.tf\">otype</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/period.tf\">period</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pnumber.tf\">pnumber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primecol.tf\">primecol</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/primeln.tf\">primeln</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/pubdate.tf\">pubdate</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/question.tf\">question</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/reading.tf\">reading</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingr.tf\">readingr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/readingu.tf\">readingu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarkable.tf\">remarkable</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/remarks.tf\">remarks</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/repeat.tf\">repeat</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLn.tf\">srcLn</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcLnNum.tf\">srcLnNum</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/srcfile.tf\">srcfile</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/subgenre.tf\">subgenre</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/supplied.tf\">supplied</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/sym.tf\">sym</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symr.tf\">symr</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/symu.tf\">symu</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/trans.tf\">trans</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/transcriber.tf\">transcriber</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/translation@en.tf\">translation@ll</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/type.tf\">type</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/uncertain.tf\">uncertain</a> <a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/volume.tf\">volume</a> <b><i><a target=\"_blank\" href=\"https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md\" title=\"/Users/dirk/github/Nino-cunei/oldbabylonian/tf/1.0.4/oslots.tf\">oslots</a></i></b> </p></details>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<style>\n",
"@font-face {\n",
" font-family: \"Santakku\";\n",
" src:\n",
" local(\"Santakku.ttf\"),\n",
" url(\"https://github.com/annotation/text-fabric/blob/master/tf/server/static/fonts/Santakku.woff?raw=true\");\n",
"}\n",
".txtn,.txtn a:visited,.txtn a:link {\n",
" font-family: sans-serif;\n",
" font-size: normal;\n",
" text-decoration: none;\n",
"}\n",
".txtp,.txtp a:visited,.txtp a:link {\n",
" font-family: monospace;\n",
" font-size: normal;\n",
" text-decoration: none;\n",
"}\n",
".txtr,.txtr a:visited,.txtr a:link {\n",
" font-family: serif;\n",
" font-size: large;\n",
" text-decoration: none;\n",
"}\n",
".txtu,.txtu a:visited,.txtu a:link {\n",
" font-family: Santakku;\n",
" font-size: x-large;\n",
" text-decoration: none;\n",
"}\n",
".features {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" font-weight: bold;\n",
" color: #0a6611;\n",
" display: flex;\n",
" flex-flow: column nowrap;\n",
" padding: 0.1em;\n",
" margin: 0.1em;\n",
" direction: ltr;\n",
"}\n",
".features div,.features span {\n",
" padding: 0;\n",
" margin: -0.1rem 0;\n",
"}\n",
".features .f {\n",
" font-family: sans-serif;\n",
" font-size: x-small;\n",
" font-weight: normal;\n",
" color: #5555bb;\n",
"}\n",
".features .xft {\n",
" color: #000000;\n",
" background-color: #eeeeee;\n",
" font-size: medium;\n",
" margin: 0.1em 0em;\n",
"}\n",
".features .xft .f {\n",
" color: #000000;\n",
" background-color: #eeeeee;\n",
" font-style: italic;\n",
" font-size: small;\n",
" font-weight: normal;\n",
"}\n",
".pnum {\n",
" font-family: sans-serif;\n",
" font-size: small;\n",
" font-weight: bold;\n",
" color: #444444;\n",
"}\n",
".nd {\n",
" font-family: monospace;\n",
" font-size: x-small;\n",
" color: #999999;\n",
"}\n",
".meta {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: row nowrap;\n",
"}\n",
".features,.comments {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: column nowrap;\n",
"}\n",
".children {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" border: 0;\n",
" background-color: #ffffff;\n",
"}\n",
".children.document {\n",
" flex-flow: column nowrap;\n",
"}\n",
".children.face {\n",
" flex-flow: column nowrap;\n",
"}\n",
".children.line {\n",
" align-items: stretch;\n",
" flex-flow: row nowrap;\n",
"}\n",
".children.cluster {\n",
" flex-flow: row wrap;\n",
"}\n",
".children.line {\n",
" align-items: stretch;\n",
" flex-flow: row nowrap;\n",
"}\n",
".children.sign {\n",
" flex-flow: column nowrap;\n",
"}\n",
".contnr {\n",
" width: fit-content;\n",
"}\n",
".contnr.document,.contnr.face,\n",
".contnr.line,\n",
".contnr.cluster,\n",
".contnr.word,\n",
".contnr.sign {\n",
" display: flex;\n",
" justify-content: flex-start;\n",
" align-items: flex-start;\n",
" align-content: flex-start;\n",
" flex-flow: column nowrap;\n",
" background: #ffffff none repeat scroll 0 0;\n",
" padding: 0.5em 0.1em 0.1em 0.1em;\n",
" margin: 0.8em 0.1em 0.1em 0.1em;\n",
" border-radius: 0.2em;\n",
" border-style: solid;\n",
" border-width: 0.2em;\n",
" font-size: small;\n",
"}\n",
".contnr.document,.contnr.face {\n",
" border-color: #bb8800;\n",
"}\n",
".contnr.line {\n",
" border-color: #0088bb;\n",
"}\n",
".contnr.cluster {\n",
" flex-flow: row wrap;\n",
" border: 0;\n",
"}\n",
".contnr.word {\n",
" border-color: #44bbff;\n",
"}\n",
".contnr.sign {\n",
" border-color: #bbbbbb;\n",
"}\n",
".contnr.hl {\n",
" background-color: #ffee66;\n",
"}\n",
".lbl.document,.lbl.face,\n",
".lbl.line,\n",
".lbl.cluster,\n",
".lbl.sign,.lbl.word {\n",
" margin-top: -1.2em;\n",
" margin-left: 1em;\n",
" background: #ffffff none repeat scroll 0 0;\n",
" padding: 0 0.3em;\n",
" border-style: solid;\n",
" font-size: small;\n",
" display: block;\n",
"}\n",
".lbl.document,.lbl.face {\n",
" border-color: #bb8800;\n",
" border-width: 0.3em;\n",
" border-radius: 0.3em;\n",
" color: #bb8800;\n",
"}\n",
".lbl.line {\n",
" border-color: #0088bb;\n",
" border-width: 0.3em;\n",
" border-radius: 0.3em;\n",
" color: #0088bb;\n",
"}\n",
".lbl.cluster {\n",
" border-color: #dddddd;\n",
" border-width: 0.2em;\n",
" border-radius: 0.2em;\n",
" color: #0000cc;\n",
"}\n",
".lbl.word {\n",
" border-color: #44bbff;\n",
" border-width: 0.2em;\n",
" border-radius: 0.2em;\n",
" font-size: medium;\n",
" color: #000000;\n",
"}\n",
".lbl.sign {\n",
" border-color: #bbbbbb;\n",
" border-width: 0.1em;\n",
" border-radius: 0.1em;\n",
" font-size: small;\n",
" color: #000000;\n",
"}\n",
".op {\n",
" padding: 0.5em 0.1em 0.1em 0.1em;\n",
" margin: 0.8em 0.1em 0.1em 0.1em;\n",
" font-family: monospace;\n",
" font-size: x-large;\n",
" font-weight: bold;\n",
"}\n",
".name {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" color: #0000bb;\n",
"}\n",
".period {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" font-weight: bold;\n",
" color: #0000bb;\n",
"}\n",
".text {\n",
" font-family: sans-serif;\n",
" font-size: x-small;\n",
" color: #000000;\n",
"}\n",
".srcln {\n",
" font-family: monospace;\n",
" font-size: medium;\n",
" color: #000000;\n",
"}\n",
".srclnnum {\n",
" font-family: monospace;\n",
" font-size: x-small;\n",
" color: #0000bb;\n",
"}\n",
".comment {\n",
" color: #7777dd;\n",
" font-family: monospace;\n",
" font-size: small;\n",
"}\n",
".operator {\n",
" color: #ff77ff;\n",
" font-size: large;\n",
"}\n",
"/* LANGUAGE: superscript and subscript */\n",
"\n",
"/* cluster */\n",
".det {\n",
" vertical-align: super;\n",
"}\n",
"/* cluster */\n",
".langalt {\n",
" vertical-align: sub;\n",
"}\n",
"/* REDACTIONAL: line over or under */\n",
"\n",
"/* flag */\n",
".collated {\n",
" font-weight: bold;\n",
" text-decoration: underline;\n",
"}\n",
"/* cluster */\n",
".excised {\n",
" color: #dd0000;\n",
" text-decoration: line-through;\n",
"}\n",
"/* cluster */\n",
".supplied {\n",
" color: #0000ff;\n",
" text-decoration: overline;\n",
"}\n",
"/* flag */\n",
".remarkable {\n",
" font-weight: bold;\n",
" text-decoration: overline;\n",
"}\n",
"\n",
"/* UNSURE: italic*/\n",
"\n",
"/* cluster */\n",
".uncertain {\n",
" font-style: italic\n",
"}\n",
"/* flag */\n",
".question {\n",
" font-weight: bold;\n",
" font-style: italic\n",
"}\n",
"\n",
"/* BROKEN: text-shadow */\n",
"\n",
"/* cluster */\n",
".missing {\n",
" color: #999999;\n",
" text-shadow: #bbbbbb 1px 1px;\n",
"}\n",
"/* flag */\n",
".damage {\n",
" font-weight: bold;\n",
" color: #999999;\n",
" text-shadow: #bbbbbb 1px 1px;\n",
"}\n",
".empty {\n",
" color: #ff0000;\n",
"}\n",
"\n",
"\n",
"tr.tf, td.tf, th.tf {\n",
" text-align: left;\n",
"}\n",
"\n",
"span.hldot {\n",
"\tbackground-color: var(--hl-strong);\n",
"\tborder: 0.2rem solid var(--hl-rim);\n",
"\tborder-radius: 0.4rem;\n",
"\t/*\n",
"\tdisplay: inline-block;\n",
"\twidth: 0.8rem;\n",
"\theight: 0.8rem;\n",
"\t*/\n",
"}\n",
"span.hl {\n",
"\tbackground-color: var(--hl-strong);\n",
"\tborder-width: 0;\n",
"\tborder-radius: 0.1rem;\n",
"\tborder-style: solid;\n",
"}\n",
"\n",
"span.hlup {\n",
"\tborder-color: var(--hl-dark);\n",
"\tborder-width: 0.1rem;\n",
"\tborder-style: solid;\n",
"\tborder-radius: 0.2rem;\n",
" padding: 0.2rem;\n",
"}\n",
"\n",
":root {\n",
"\t--hl-strong: hsla( 60, 100%, 70%, 0.9 );\n",
"\t--hl-rim: hsla( 55, 100%, 60%, 0.9 );\n",
"\t--hl-dark: hsla( 55, 100%, 40%, 0.9 );\n",
"}\n",
"</style>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<details open><summary><b>API members</b>:</summary>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">C Computed</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Call AllComputeds</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Computed/#computed-data\" title=\"doc\">Cs ComputedString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">E Edge</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Eall AllEdges</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#edge-features\" title=\"doc\">Es EdgeString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ensureLoaded</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">TF</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">ignored</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Fabric/#loading\" title=\"doc\">loadLog</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Locality/#locality\" title=\"doc\">L Locality</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">cache</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">error</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">indent</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">info</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Misc/#messaging\" title=\"doc\">reset</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">N Nodes</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKey</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortKeyTuple</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">otypeRank</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Nodes/#navigating-nodes\" title=\"doc\">sortNodes</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">F Feature</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fall AllFeatures</a>, <a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Features/#node-features\" title=\"doc\">Fs FeatureString</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Search/#search\" title=\"doc\">S Search</a><br/>\n",
"<a target=\"_blank\" href=\"https://annotation.github.io/text-fabric/Api/Text/#text\" title=\"doc\">T Text</a></details>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A = use('oldbabylonian', hoist=globals(), lgc=True)\n",
"\n",
"BASE = os.path.expanduser('~/github')\n",
"\n",
"SOURCE_VERSION = '0.3'\n",
"SOURCE_DIR = (\n",
" f'{BASE}/{A.org}/{A.repo}/sources/cdli/transcriptions/{SOURCE_VERSION}'\n",
")\n",
"SOURCE_FILES = '''\n",
" AbB-primary\n",
" AbB-secondary\n",
"'''.strip().split()\n",
"\n",
"TEMP_DIR = f'{BASE}/_temp'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FACES:\n",
"\t\n",
"\tbottom\n",
"\tcase\n",
"\tcase - lower edge\n",
"\tcase - obverse\n",
"\tcase - reverse\n",
"\tcase - seal\n",
"\tenvelope\n",
"\tenvelope - obverse\n",
"\tenvelope - reverse\n",
"\tenvelope - seal 1\n",
"\teyestone - surface a\n",
"\tleft\n",
"\tleft edge\n",
"\tleft side\n",
"\tlower edge\n",
"\tobverse\n",
"\treverse\n",
"\tseal\n",
"\tseal 1\n",
"\tseal 2\n",
"\tupper edge\n",
"EMPTY TABLETS (0):\n"
]
}
],
"source": [
"from utils import Compare\n",
"COMP = Compare(TF.api, SOURCE_DIR, SOURCE_FILES, TEMP_DIR)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Character usage\n",
"\n",
"We make an inventory of all characters that occur on an atf line in transcribed material."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"transRe = re.compile(r'''^([0-9a-zA-Z']*)\\.\\s+(.+)$''')\n",
"trimRe = re.compile(r'''\\s\\s+''')\n",
"\n",
"prime = \"'\"\n",
"times = '×'\n",
"div = '÷'\n",
"quad = '|'\n",
"\n",
"clusterChars = (\n",
" ('┌', '┐', '_', '_', 'langalt'),\n",
" ('◀', '▶', '{', '}', 'det'),\n",
" ('∈', '∋', '(', ')', 'uncertain'),\n",
" ('〖', '〗', '[', ']', 'missing'),\n",
" ('«', '»', '<<', '>>', 'excised'),\n",
" ('⊂', '⊃', '<', '>', 'supplied'),\n",
")\n",
"\n",
"clusterType = {x[0]: x[4] for x in clusterChars}\n",
"clusterTypeInfo = {x[4]: x[0:-1] for x in clusterChars}\n",
"\n",
"clusterB = {c[0] for c in clusterChars}\n",
"clusterE = {c[1] for c in clusterChars}\n",
"clusterA = clusterB | clusterE\n",
"clusterOB = {c[2] for c in clusterChars}\n",
"clusterOE = {c[3] for c in clusterChars}\n",
"clusterOA = clusterOB | clusterOE\n",
"clusterBstr = ''.join(sorted(clusterB))\n",
"clusterEstr = ''.join(sorted(clusterE))\n",
"clusterAstr = ''.join(sorted(clusterA))\n",
"\n",
"flaggingStr = '!?*#'\n",
"flagging = set(flaggingStr)\n",
"\n",
"separatorStr = '-'\n",
"separator = set(separatorStr)\n",
"\n",
"ellips = '…'\n",
"unknownStr = 'xXnN'\n",
"unknown = set(unknownStr) | {ellips}\n",
"\n",
"lowerLetterStr = 'abcdefghijklmnopqrstuvwyz'\n",
"upperLetterStr = lowerLetterStr.upper()\n",
"lowerLetter = set(lowerLetterStr)\n",
"upperLetter = set(upperLetterStr)\n",
"\n",
"digitStr = '0123456789'\n",
"digit = set(digitStr) | {div}\n",
"\n",
"emph_s = 'ş'\n",
"emph_S = 'Ş'\n",
"emph_t = 'ţ'\n",
"emph_T = 'Ţ'\n",
"\n",
"emphatic = {emph_s, emph_S, emph_t, emph_T}\n",
"\n",
"def emphRepl(x):\n",
" return x.replace('s,', emph_s).replace('S,', emph_S).replace('t,', emph_t).replace('T,', emph_T)\n",
"\n",
"inlineCommentRe = re.compile(r'''\\(\\$.*?\\$\\)''')\n",
"\n",
"operatorStr = f'.+/:{times}'\n",
"operator = set(operatorStr)\n",
"\n",
"divRe = re.compile(r'''([0-9])/([0-9])''')\n",
"\n",
"def divRepl(match):\n",
" return f'{match.group(1)}{div}{match.group(2)}'"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"seen = collections.Counter()\n",
"\n",
"for (srcfile, document, face, column, ln, line) in COMP.readCorpora():\n",
" match = transRe.match(line)\n",
" if not match:\n",
" continue\n",
" trans = match.group(2)\n",
" \n",
" trans = inlineCommentRe.sub('', trans)\n",
" trans = trans.replace('...', ellips)\n",
" trans = trans.replace('x(', times)\n",
" trans = emphRepl(trans)\n",
" trans = divRe.sub(divRepl, trans)\n",
" \n",
" words = trans.split()\n",
" for word in words:\n",
" for (i, c) in enumerate(word):\n",
" seen[c] += 1"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cluster:\n",
"\t_ 15200\n",
"\t[ 7572\n",
"\t] 7572\n",
"\t{ 6794\n",
"\t} 6794\n",
"\t) 3489\n",
"\t( 3484\n",
"\t< 369\n",
"\t> 369\n",
"digit:\n",
"\t2 15362\n",
"\t3 5858\n",
"\t4 1412\n",
"\t1 1190\n",
"\t5 424\n",
"\t8 264\n",
"\t6 263\n",
"\t7 146\n",
"\t÷ 121\n",
"\t0 43\n",
"\t9 36\n",
"emphatic:\n",
"\tţ 2212\n",
"\tş 1748\n",
"\tŞ 5\n",
"flagging:\n",
"\t# 9974\n",
"\t? 560\n",
"\t! 216\n",
"\t* 13\n",
"lower:\n",
"\ta 83892\n",
"\ti 56380\n",
"\tu 45188\n",
"\tm 34283\n",
"\ts 26373\n",
"\tz 24237\n",
"\tn 21059\n",
"\tl 16466\n",
"\td 14416\n",
"\tr 14193\n",
"\tt 14124\n",
"\tk 13164\n",
"\tb 12681\n",
"\te 11430\n",
"\tp 5266\n",
"\tg 4486\n",
"\th 4243\n",
"\tq 3666\n",
"\tw 1176\n",
"\ty 1\n",
"operator:\n",
"\t/ 15\n",
"\t. 11\n",
"\t× 5\n",
"\t+ 2\n",
"\t: 1\n",
"prime:\n",
"\t' 38\n",
"quad:\n",
"\t| 8\n",
"separator:\n",
"\t- 118903\n",
"unknown:\n",
"\tx 8729\n",
"\t… 1617\n",
"\tN 192\n",
"upper:\n",
"\tA 808\n",
"\tI 448\n",
"\tD 337\n",
"\tU 270\n",
"\tR 222\n",
"\tZ 186\n",
"\tG 184\n",
"\tB 153\n",
"\tK 143\n",
"\tS 102\n",
"\tL 61\n",
"\tH 60\n",
"\tM 58\n",
"\tE 54\n",
"\tT 48\n",
"\tP 42\n",
"\tW 9\n"
]
}
],
"source": [
"allChars = collections.defaultdict(dict)\n",
"\n",
"for (c, amount) in seen.items():\n",
" if c in lowerLetter:\n",
" allChars['lower'][c] = amount\n",
" elif c in unknown:\n",
" allChars['unknown'][c] = amount\n",
" elif c in upperLetter:\n",
" allChars['upper'][c] = amount\n",
" elif c in digit:\n",
" allChars['digit'][c] = amount\n",
" elif c in emphatic:\n",
" allChars['emphatic'][c] = amount\n",
" elif c == prime:\n",
" allChars['prime'][c] = amount\n",
" elif c == quad:\n",
" allChars['quad'][c] = amount\n",
" elif c in flagging:\n",
" allChars['flagging'][c] = amount\n",
" elif c in separator:\n",
" allChars['separator'][c] = amount\n",
" elif c in operator:\n",
" allChars['operator'][c] = amount\n",
" elif c in clusterOA:\n",
" allChars['cluster'][c] = amount\n",
" else:\n",
" allChars['rest'][c] = amount\n",
" \n",
"for (kind, data) in sorted(allChars.items()):\n",
" print(f'{kind}:')\n",
" for (c, amount) in sorted(\n",
" data.items(),\n",
" key=lambda x: (-x[1], x[0]),\n",
" ):\n",
" print(f'\\t{c:<1} {amount:>6}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Documents\n",
"\n",
"## Document language\n",
"\n",
"Here are the document languages, according to the `#atf:lang` meta tags:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"akk 1283 x\n",
"sux 2 x\n"
]
}
],
"source": [
"for (c, amount) in F.lang.freqList():\n",
" print(f'{c} {amount:>6} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Document collection/volume/number/note\n",
"\n",
"In the ATF source, after the line with the P-number (`&P...`) there is additional identification,\n",
"usually in the form *collection* *volume*, *number* *note*.\n",
"\n",
"We give an overview of the collections in which the documents of this corpus are found,\n",
"and we list the *note*s, which are really the irregular parts of the identification.\n",
"\n",
"We will not check the TF values with the GREP values for this part of the document identification."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AbB 492 x\n",
"CT 241 x\n",
"VS 218 x\n",
"YOS 108 x\n",
"TCL 105 x\n",
"LIH 77 x\n",
"YNER 16 x\n",
"TLB 10 x\n",
"BIN 7 x\n",
"OECT 3 x\n",
"AJSL 1 x\n",
"CT43, 1 x\n",
"JCS 1 x\n",
"LFBD 1 x\n",
"RA 1 x\n",
"RIME 1 x\n",
"abb 1 x\n"
]
}
],
"source": [
"for (c, amount) in F.collection.freqList():\n",
" print(f'{c:<8} {amount:>6} x')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"37 BM 097815 1 x\n",
"AO 21105 1 x\n",
"BM 012819 1 x\n",
"BM 023357 1 x\n",
"BM 023823 1 x\n",
"BM 025693 1 x\n",
"BM 027780 1 x\n",
"BM 028435 1 x\n",
"BM 028436 1 x\n",
"BM 028444 1 x\n",
"BM 028447 1 x\n",
"BM 028457 1 x\n",
"BM 028473 1 x\n",
"BM 028474 1 x\n",
"BM 028475 1 x\n",
"BM 028476 1 x\n",
"BM 028508 1 x\n",
"BM 028510 1 x\n",
"BM 028531 1 x\n",
"BM 028558 1 x\n",
"BM 028559 1 x\n",
"BM 028588 1 x\n",
"BM 028840 1 x\n",
"BM 029655 1 x\n",
"BM 040037 1 x\n",
"BM 078214 1 x\n",
"BM 080186 1 x\n",
"BM 080329 1 x\n",
"BM 080340 1 x\n",
"BM 080354 1 x\n",
"BM 080410 1 x\n",
"BM 080484 1 x\n",
"BM 080558 1 x\n",
"BM 080594 1 x\n",
"BM 080612 1 x\n",
"BM 080616 1 x\n",
"BM 080685 1 x\n",
"BM 080723 1 x\n",
"BM 080797 1 x\n",
"BM 080802 1 x\n",
"BM 080816 1 x\n",
"BM 080840 1 x\n",
"BM 080878 1 x\n",
"BM 080885 1 x\n",
"BM 080897 1 x\n",
"BM 080947 1 x\n",
"BM 081095 1 x\n",
"BM 087395 1 x\n",
"BM 096604 1 x\n",
"BM 096608 1 x\n",
"BM 096629 1 x\n",
"BM 097031 1 x\n",
"BM 097040 1 x\n",
"BM 097050 1 x\n",
"BM 097098 1 x\n",
"BM 097115 1 x\n",
"BM 097130 1 x\n",
"BM 097274 1 x\n",
"BM 097325 1 x\n",
"BM 097347 1 x\n",
"BM 097405 1 x\n",
"BM 097675 1 x\n",
"BM 097686 1 x\n",
"BM 097693 1 x\n",
"BM 097816 1 x\n",
"BM 100117 1 x\n",
"BM 103848 1 x\n",
"Bu 1888-05-12, 0184 1 x\n",
"Bu 1888-05-12, 0278 1 x\n",
"Bu 1888-05-12, 0323 1 x\n",
"Bu 1888-05-12, 0329 1 x\n",
"Bu 1888-05-12, 0333 1 x\n",
"Bu 1888-05-12, 0342 1 x\n",
"Bu 1888-05-12, 0505 1 x\n",
"Bu 1888-05-12, 0568 1 x\n",
"Bu 1888-05-12, 0581 1 x\n",
"Bu 1888-05-12, 0598 1 x\n",
"Bu 1888-05-12, 0602 1 x\n",
"Bu 1888-05-12, 0607 1 x\n",
"Bu 1888-05-12, 0621 1 x\n",
"Bu 1888-05-12, 0638 1 x\n",
"Bu 1888-05-12, 200 1 x\n",
"Bu 1888-05-12, 207 1 x\n",
"Bu 1888-05-12, 212 1 x\n",
"Bu 1891-05-09, 0279 1 x\n",
"Bu 1891-05-09, 0315 1 x\n",
"Bu 1891-05-09, 0370 1 x\n",
"Bu 1891-05-09, 0383 1 x\n",
"Bu 1891-05-09, 0413 1 x\n",
"Bu 1891-05-09, 0418 1 x\n",
"Bu 1891-05-09, 0468 1 x\n",
"Bu 1891-05-09, 0534 1 x\n",
"Bu 1891-05-09, 0579a 1 x\n",
"Bu 1891-05-09, 0585 1 x\n",
"Bu 1891-05-09, 0587 1 x\n",
"Bu 1891-05-09, 0790 1 x\n",
"Bu 1891-05-09, 1154 1 x\n",
"Bu 1891-05-09, 2185 1 x\n",
"Bu 1891-05-09, 2187 1 x\n",
"Bu 1891-05-09, 2194 1 x\n",
"Bu 1891-05-09, 290 1 x\n",
"Bu 1891-05-09, 294 1 x\n",
"Bu 1891-05-09, 325 1 x\n",
"Bu 1891-05-09, 354 1 x\n",
"Fs Landsberger 235 1 x\n",
"ex. 01 1 x\n",
"no. 2 1 x\n",
"pp. 980191 no. 1 1 x\n"
]
}
],
"source": [
"for (c, amount) in F.docnote.freqList():\n",
" print(f'{c:<8} {amount:>6} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We check whether we have the same sequence of document numbers.\n",
"In TF, the document number is stored in the feature `pnumber`.\n",
"\n",
"Note that we also check on the order of the documents."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:35.754992Z",
"start_time": "2018-03-06T06:42:35.709222Z"
}
},
"outputs": [],
"source": [
"def tfDocuments():\n",
" documents = []\n",
" for t in F.otype.s('document'):\n",
" (document,) = T.sectionFromNode(t)\n",
" documents.append((F.srcfile.v(t), document, F.srcLnNum.v(t), F.pnumber.v(t)))\n",
" return documents\n",
"\n",
"def grepDocuments(gen):\n",
" documents = []\n",
" prevTablet = None\n",
" for (srcFile, document, face, column, srcLnNum, srcLn) in gen:\n",
" if document != prevTablet:\n",
" documents.append((srcFile, document, srcLnNum, document))\n",
" prevTablet = document\n",
" return documents"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:38.307249Z",
"start_time": "2018-03-06T06:42:37.990755Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ tablet\n",
"IDENTICAL: all 1285 items\n",
"= : AbB-primary ◆ P509373 ◆ 27 ◆ P509373\n",
"= : AbB-primary ◆ P509374 ◆ 96 ◆ P509374\n",
"= : AbB-primary ◆ P509375 ◆ 147 ◆ P509375\n",
"= : AbB-primary ◆ P509376 ◆ 196 ◆ P509376\n",
"= : AbB-primary ◆ P509377 ◆ 250 ◆ P509377\n",
"= : AbB-primary ◆ P507628 ◆ 309 ◆ P507628\n",
"= : AbB-primary ◆ P481190 ◆ 349 ◆ P481190\n",
"= : AbB-primary ◆ P481191 ◆ 392 ◆ P481191\n",
"= : AbB-primary ◆ P481192 ◆ 443 ◆ P481192\n",
"= : AbB-primary ◆ P389958 ◆ 508 ◆ P389958\n",
"= : AbB-primary ◆ P389256 ◆ 552 ◆ P389256\n",
"= : AbB-primary ◆ P510526 ◆ 7593 ◆ P510526\n",
"= : AbB-primary ◆ P510527 ◆ 7643 ◆ P510527\n",
"= : AbB-primary ◆ P510528 ◆ 7708 ◆ P510528\n",
"= : AbB-primary ◆ P510529 ◆ 7753 ◆ P510529\n",
"= : AbB-primary ◆ P510530 ◆ 7805 ◆ P510530\n",
"= : AbB-primary ◆ P510531 ◆ 7879 ◆ P510531\n",
"= : AbB-primary ◆ P510532 ◆ 7931 ◆ P510532\n",
"= : AbB-primary ◆ P510533 ◆ 7984 ◆ P510533\n",
"= : AbB-primary ◆ P510534 ◆ 8032 ◆ P510534\n",
"= and 1265 more\n",
"Number of results: TF 1285; GREP 1285\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('tablet',),\n",
" grepDocuments,\n",
" tfDocuments,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Faces\n",
"\n",
"## Objects\n",
"\n",
"First we count on which kind of objects the faces occur."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tablet 2778 x\n",
"envelope 43 x\n",
"case 12 x\n",
"eyestone 1 x\n"
]
}
],
"source": [
"for (obj, amount) in F.object.freqList():\n",
" print(f'{obj:<10} {amount:>5} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We check whether we see the same faces with GREP and TF."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:56.827129Z",
"start_time": "2018-03-06T06:42:56.796606Z"
}
},
"outputs": [],
"source": [
"def tfFaces():\n",
" faces = []\n",
" for document in F.otype.s('document'):\n",
" documentName = F.pnumber.v(document)\n",
" srcfile = F.srcfile.v(document)\n",
" for face in L.d(document, otype='face'):\n",
" typ = F.face.v(face)\n",
" firstLine = L.d(face, otype='line')[0]\n",
" ln = F.srcLnNum.v(firstLine)\n",
" faces.append((srcfile, documentName, ln, typ))\n",
" return faces"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [],
"source": [
"def grepFaces(gen):\n",
" faces = []\n",
" prevDocument = None\n",
" prevFace = None\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" if face is None or (prevDocument == document and prevFace == face):\n",
" continue\n",
" faces.append((srcfile, document, srcLnNum, face))\n",
" prevDocument = document\n",
" prevFace = face\n",
" return faces"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:43:00.018724Z",
"start_time": "2018-03-06T06:42:59.646177Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ face\n",
"IDENTICAL: all 2834 items\n",
"= : AbB-primary ◆ P509373 ◆ 31 ◆ obverse\n",
"= : AbB-primary ◆ P509373 ◆ 48 ◆ reverse\n",
"= : AbB-primary ◆ P509374 ◆ 100 ◆ obverse\n",
"= : AbB-primary ◆ P509374 ◆ 117 ◆ reverse\n",
"= : AbB-primary ◆ P509375 ◆ 151 ◆ obverse\n",
"= : AbB-primary ◆ P509375 ◆ 156 ◆ reverse\n",
"= : AbB-primary ◆ P509376 ◆ 200 ◆ obverse\n",
"= : AbB-primary ◆ P509376 ◆ 212 ◆ reverse\n",
"= : AbB-primary ◆ P509377 ◆ 254 ◆ obverse\n",
"= : AbB-primary ◆ P509377 ◆ 268 ◆ reverse\n",
"= : AbB-primary ◆ P507628 ◆ 313 ◆ obverse\n",
"= : AbB-primary ◆ P507628 ◆ 321 ◆ reverse\n",
"= : AbB-primary ◆ P481190 ◆ 353 ◆ obverse\n",
"= : AbB-primary ◆ P481190 ◆ 361 ◆ reverse\n",
"= : AbB-primary ◆ P481191 ◆ 396 ◆ obverse\n",
"= : AbB-primary ◆ P481191 ◆ 406 ◆ reverse\n",
"= : AbB-primary ◆ P481191 ◆ 413 ◆ seal 1\n",
"= : AbB-primary ◆ P481192 ◆ 447 ◆ obverse\n",
"= : AbB-primary ◆ P481192 ◆ 464 ◆ reverse\n",
"= : AbB-primary ◆ P389958 ◆ 512 ◆ obverse\n",
"= and 2814 more\n",
"Number of results: TF 2834; GREP 2834\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('face',),\n",
" grepFaces,\n",
" tfFaces,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Columns and lines\n",
"\n",
"We check whether we see the same column and line numbers with GREP and TF."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:56.827129Z",
"start_time": "2018-03-06T06:42:56.796606Z"
}
},
"outputs": [],
"source": [
"def tfLines():\n",
" lines = []\n",
" for document in F.otype.s('document'):\n",
" documentName = F.pnumber.v(document)\n",
" srcfile = F.srcfile.v(document)\n",
" for face in L.d(document, otype='face'):\n",
" typ = F.face.v(face)\n",
" for line in L.d(face, otype='line'):\n",
" srcLn = F.srcLnNum.v(line)\n",
" ln = str(F.ln.v(line) or F.lnc.v(line))\n",
" if F.primeln.v(line):\n",
" ln += \"'\"\n",
" col = str(F.col.v(line) or '')\n",
" if F.primecol.v(line):\n",
" col += \"'\"\n",
" lines.append((srcfile, documentName, srcLn, typ, col, ln))\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [],
"source": [
"def grepLines(gen):\n",
" lines = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" if face is None or column is None:\n",
" continue\n",
" isComment = srcLn.startswith('$')\n",
" if isComment:\n",
" ln = srcLn[0]\n",
" else:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" ln = match.group(1)\n",
" lines.append((srcfile, document, srcLnNum, face, column, ln))\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:43:07.490571Z",
"start_time": "2018-03-06T06:43:06.931814Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ face ◆ column ◆ atf lineno\n",
"IDENTICAL: all 27375 items\n",
"= : AbB-primary ◆ P509373 ◆ 31 ◆ obverse ◆ ◆ 1\n",
"= : AbB-primary ◆ P509373 ◆ 32 ◆ obverse ◆ ◆ 2\n",
"= : AbB-primary ◆ P509373 ◆ 33 ◆ obverse ◆ ◆ 3\n",
"= : AbB-primary ◆ P509373 ◆ 34 ◆ obverse ◆ ◆ 4\n",
"= : AbB-primary ◆ P509373 ◆ 35 ◆ obverse ◆ ◆ 5\n",
"= : AbB-primary ◆ P509373 ◆ 36 ◆ obverse ◆ ◆ 6\n",
"= : AbB-primary ◆ P509373 ◆ 37 ◆ obverse ◆ ◆ 7\n",
"= : AbB-primary ◆ P509373 ◆ 38 ◆ obverse ◆ ◆ 8\n",
"= : AbB-primary ◆ P509373 ◆ 39 ◆ obverse ◆ ◆ 9\n",
"= : AbB-primary ◆ P509373 ◆ 40 ◆ obverse ◆ ◆ 10\n",
"= : AbB-primary ◆ P509373 ◆ 41 ◆ obverse ◆ ◆ 11\n",
"= : AbB-primary ◆ P509373 ◆ 42 ◆ obverse ◆ ◆ 12\n",
"= : AbB-primary ◆ P509373 ◆ 43 ◆ obverse ◆ ◆ 13\n",
"= : AbB-primary ◆ P509373 ◆ 44 ◆ obverse ◆ ◆ 14\n",
"= : AbB-primary ◆ P509373 ◆ 45 ◆ obverse ◆ ◆ 15\n",
"= : AbB-primary ◆ P509373 ◆ 46 ◆ obverse ◆ ◆ $\n",
"= : AbB-primary ◆ P509373 ◆ 48 ◆ reverse ◆ ◆ $\n",
"= : AbB-primary ◆ P509373 ◆ 49 ◆ reverse ◆ ◆ 1'\n",
"= : AbB-primary ◆ P509373 ◆ 50 ◆ reverse ◆ ◆ 2'\n",
"= : AbB-primary ◆ P509373 ◆ 51 ◆ reverse ◆ ◆ 3'\n",
"= and 27355 more\n",
"Number of results: TF 27375; GREP 27375\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('face', 'column', 'atf lineno'),\n",
" grepLines,\n",
" tfLines,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Remarks\n",
"\n",
"Remarks are marked by the `#` character in lines that are not\n",
"metadata following the document header. The criterion for a line starting with `#` to be\n",
"a comment is that it has a space after the `#`.\n",
"\n",
"There are also translation lines, starting with `#tr.en`, but we do not deal withg those here."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"def tfRemarks():\n",
" remarks = []\n",
" for l in F.otype.s('line'):\n",
" rmks = F.remarks.v(l)\n",
" if rmks:\n",
" for (i, rmk) in enumerate(rmks.split('\\n')):\n",
" remarks.append((F.srcfile.v(l), F.srcLnNum.v(l) + i + 1, rmk))\n",
" return remarks"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:07.443492Z",
"start_time": "2018-03-06T06:46:07.414650Z"
}
},
"outputs": [],
"source": [
"def grepRemarks(gen):\n",
" remarks = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" isRemark = srcLn.startswith('#') and len(srcLn) > 1 and srcLn[1] == ' '\n",
" if isRemark:\n",
" remarks.append((srcfile, srcLnNum, srcLn[1:].strip()))\n",
" return remarks"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:08.782374Z",
"start_time": "2018-03-06T06:46:08.037694Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ remark\n",
"IDENTICAL: all 12 items\n",
"= : AbB-secondary ◆ 11849 ◆ word (li-ba-al-li-t,u2-ka) divided over two lines\n",
"= : AbB-secondary ◆ 12535 ◆ word (i-li-ka-am) divided over two lines\n",
"= : AbB-secondary ◆ 15552 ◆ reading i-ba-al-lu-ut, proposed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15555 ◆ reading szi-'i-it-sa3 proposed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15559 ◆ reading tu-ut-t,i-bi-ma following Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15573 ◆ reading ma-s,a-ra-am proposed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15575 ◆ reading a-hu-ki propsed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15577 ◆ reading ki-i ne-em-szi-ma propsed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 15582 ◆ reconstruction of this line propsed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 16226 ◆ reading szu-ku-si propsed by Von Soden BiOr 23 55\n",
"= : AbB-secondary ◆ 68946 ◆ reading la-mi! proposed by Von Soden BiOr 39 590\n",
"= : AbB-secondary ◆ 69030 ◆ reading la us2-su2-ka-tim proposed by Von Soden BiOr 39 590\n",
"= no more items\n",
"Number of results: TF 12; GREP 12\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('remark',),\n",
" grepRemarks,\n",
" tfRemarks,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Translations\n",
"\n",
"Translations are marked by the `#` character in lines that are not\n",
"metadata following the document header.\n",
"The `#` must be immediately followed by `tr.`*language code*`:` and the translation comes after that (with white space in between)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"languages: en\n"
]
}
],
"source": [
"languages = [t[12:] for t in Fall() if t.startswith('translation@')]\n",
"print(f'languages: {\", \".join(languages)}')\n",
"\n",
"def tfTrans():\n",
" trans = []\n",
" for l in F.otype.s('line'):\n",
" for lc in languages:\n",
" trs = Fs(f'translation@{lc}').v(l)\n",
" if trs:\n",
" trans.append((F.srcfile.v(l), F.srcLnNum.v(l) + 1, lc, trs))\n",
" return trans"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:07.443492Z",
"start_time": "2018-03-06T06:46:07.414650Z"
}
},
"outputs": [],
"source": [
"def grepTrans(gen):\n",
" trans = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" isTrans = srcLn.startswith('#tr.')\n",
" if isTrans:\n",
" parts = srcLn[4:].split(':', 1)\n",
" if len(parts) > 1:\n",
" lc = parts[0].strip()\n",
" trs = parts[1].strip()\n",
" trans.append((srcfile, srcLnNum, lc, trs))\n",
" return trans"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:08.782374Z",
"start_time": "2018-03-06T06:46:08.037694Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ language ◆ translation\n",
"IDENTICAL: all 134 items\n",
"= : AbB-secondary ◆ 27139 ◆ en ◆ To Šamaš-ḫazir\n",
"= : AbB-secondary ◆ 27141 ◆ en ◆ speak,\n",
"= : AbB-secondary ◆ 27143 ◆ en ◆ thus Hammurapi:\n",
"= : AbB-secondary ◆ 27145 ◆ en ◆ Ilī-ippalsam, the shepherd,\n",
"= : AbB-secondary ◆ 27147 ◆ en ◆ thus informed me, as follows that one:\n",
"= : AbB-secondary ◆ 27149 ◆ en ◆ A field of 3 bur3, which through a sealed document of my lord\n",
"= : AbB-secondary ◆ 27151 ◆ en ◆ was given (lit. sealed) to me—\n",
"= : AbB-secondary ◆ 27153 ◆ en ◆ 4 years ago Etel-pî-Marduk took it away from me, and\n",
"= : AbB-secondary ◆ 27155 ◆ en ◆ its barley regularly takes.\n",
"= : AbB-secondary ◆ 27157 ◆ en ◆ Further, Sîn-iddinam I informed,\n",
"= : AbB-secondary ◆ 27159 ◆ en ◆ but it was not returned to me;\n",
"= : AbB-secondary ◆ 27161 ◆ en ◆ Thus he (Ilī-ippalsam) informed me.\n",
"= : AbB-secondary ◆ 27163 ◆ en ◆ To Sîn-iddinam I (now) have written.\n",
"= : AbB-secondary ◆ 27165 ◆ en ◆ If, as that Ilī-ippalsam\n",
"= : AbB-secondary ◆ 27167 ◆ en ◆ said,\n",
"= : AbB-secondary ◆ 27169 ◆ en ◆ a field of 3 bur3, which in the palace\n",
"= : AbB-secondary ◆ 27171 ◆ en ◆ was given (lit. sealed) to him,\n",
"= : AbB-secondary ◆ 27174 ◆ en ◆ Etel-pî-Marduk 4 years ago took away, and\n",
"= : AbB-secondary ◆ 27176 ◆ en ◆ is ‘eating,′\n",
"= : AbB-secondary ◆ 27178 ◆ en ◆ then a more sickening case\n",
"= and 114 more\n",
"Number of results: TF 134; GREP 134\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('language', 'translation',),\n",
" grepTrans,\n",
" tfTrans,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Line comments\n",
"\n",
"Comments are marked by the `$` character at the start of a line.\n",
"\n",
"We have also inline comments, shaped as `($ $)` but we do not deal with them here.\n",
"Inline comments are treated under signs."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"def tfComments():\n",
" comments = []\n",
" for l in F.otype.s('line'):\n",
" if not F.lnc.v(l):\n",
" continue\n",
" comment = F.comment.v(L.d(l, otype='sign')[0])\n",
" if comment:\n",
" ln = F.lnc.v(l)\n",
" comments.append((F.srcfile.v(l), F.srcLnNum.v(l), ln, comment))\n",
" return comments"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:07.443492Z",
"start_time": "2018-03-06T06:46:07.414650Z"
}
},
"outputs": [],
"source": [
"def grepComments(gen):\n",
" comments = []\n",
" for (srcfile, document, face, column, ln, line) in gen:\n",
" isComment = line.startswith('$')\n",
" if isComment:\n",
" comments.append((srcfile, ln, line[0], line[1:].strip()))\n",
" return comments"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:08.782374Z",
"start_time": "2018-03-06T06:46:08.037694Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ comment\n",
"IDENTICAL: all 969 items\n",
"= : AbB-primary ◆ 46 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 48 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 154 ◆ $ ◆ rest missing\n",
"= : AbB-primary ◆ 319 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ 321 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ 447 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 462 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 464 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 480 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 512 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 556 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 562 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 564 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 568 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 8049 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 8051 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ 8180 ◆ $ ◆ single ruling\n",
"= : AbB-primary ◆ 8222 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ 8441 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ 8554 ◆ $ ◆ single ruling\n",
"= and 949 more\n",
"Number of results: TF 969; GREP 969\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('comment',),\n",
" grepComments,\n",
" tfComments,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Metadata\n",
"\n",
"Metadata comes from lines starting with a `#` without a space following the `#`.\n",
"\n",
"We have found metadata for language, translation (English) and comments to the contents of lines.\n",
"\n",
"The language is specified for documents, the translation for lines."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"def tfMetas():\n",
" metas = []\n",
" for d in F.otype.s('document'):\n",
" lang = F.lang.v(d)\n",
" documentName = F.pnumber.v(d)\n",
" srcfile = F.srcfile.v(d)\n",
" if lang:\n",
" srcLn = F.srcLnNum.v(d)\n",
" metas.append((srcfile, documentName, srcLn + 1, f'atf: lang = {lang}'))\n",
" for l in L.d(d, otype='line'):\n",
" trans = Fs('translation@en').v(l)\n",
" if trans:\n",
" srcLn = F.srcLnNum.v(l)\n",
" metas.append((srcfile, documentName, srcLn + 1, f'tr.en: = {trans}'))\n",
" return metas"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:07.443492Z",
"start_time": "2018-03-06T06:46:07.414650Z"
}
},
"outputs": [],
"source": [
"def grepMetas(gen):\n",
" metas = []\n",
" for (srcfile, document, face, column, ln, line) in gen:\n",
" if line.startswith('#') and len(line) > 1 and line[1] != ' ':\n",
" if line.startswith('#atf:l'):\n",
" line = '#atf: l' + line[6:]\n",
" fields = line[1:].split(maxsplit=1)\n",
" nFields = len(fields)\n",
" if nFields == 1:\n",
" key = fields[0]\n",
" feat = ''\n",
" val = ''\n",
" else:\n",
" (key, val) = fields\n",
" feat = ''\n",
" if key == 'atf:':\n",
" fields = val.split(maxsplit=1)\n",
" nFields = len(fields)\n",
" if nFields == 2:\n",
" (feat, val) = fields\n",
" if val.startswith('='):\n",
" val = val[1:].strip()\n",
" metas.append((srcfile, document, ln, f'{key} {feat} = {val}'))\n",
" return metas"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:08.782374Z",
"start_time": "2018-03-06T06:46:08.037694Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ comment\n",
"IDENTICAL: all 1419 items\n",
"= : AbB-primary ◆ P509373 ◆ 28 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P509374 ◆ 97 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P509375 ◆ 148 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P509376 ◆ 197 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P509377 ◆ 251 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P507628 ◆ 310 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P481190 ◆ 350 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P481191 ◆ 393 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P481192 ◆ 444 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P389958 ◆ 509 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P389256 ◆ 553 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510526 ◆ 7594 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510527 ◆ 7644 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510528 ◆ 7709 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510529 ◆ 7754 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510530 ◆ 7806 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510531 ◆ 7880 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510532 ◆ 7932 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510533 ◆ 7985 ◆ atf: lang = akk\n",
"= : AbB-primary ◆ P510534 ◆ 8033 ◆ atf: lang = akk\n",
"= and 1399 more\n",
"Number of results: TF 1419; GREP 1419\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('comment',),\n",
" grepMetas,\n",
" tfMetas,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Line contents\n",
"\n",
"We check whether the contents of lines after the number can be reproduced by means of TF features\n",
"\n",
"There are two ways to do that: \n",
"\n",
"1. using the feature `scrLn`\n",
"2. using `T.text()`\n",
"\n",
"## By the feature `srcLn`\n",
"\n",
"This way is rather trivial.\n",
"But it is applicable to all lines, also comment lines.\n",
"We also pick up remarks, but not translations."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:56.827129Z",
"start_time": "2018-03-06T06:42:56.796606Z"
}
},
"outputs": [],
"source": [
"def tfLineContents():\n",
" lines = []\n",
" for document in F.otype.s('document'):\n",
" documentName = F.pnumber.v(document)\n",
" srcfile = F.srcfile.v(document)\n",
" for line in L.d(document, otype='line'):\n",
" srcLnNum = F.srcLnNum.v(line)\n",
" srcLn = F.srcLn.v(line)\n",
" lines.append((srcfile, documentName, srcLnNum, srcLn))\n",
" remarks = F.remarks.v(line)\n",
" if remarks:\n",
" for (i, remark) in enumerate(remarks.split('\\n')):\n",
" lines.append((srcfile, documentName, srcLnNum + i + 1, f'# {remark}'))\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [],
"source": [
"structureChars = set('&@')\n",
"\n",
"def grepLineContents(gen):\n",
" lines = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" if not srcLn or srcLn[0] in structureChars or (srcLn[0] == '#' and len(srcLn) > 1 and srcLn[1] != ' '):\n",
" continue\n",
" lines.append((srcfile, document, srcLnNum, srcLn))\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:43:07.490571Z",
"start_time": "2018-03-06T06:43:06.931814Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ contents\n",
"IDENTICAL: all 27387 items\n",
"= : AbB-primary ◆ P509373 ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam]\n",
"= : AbB-primary ◆ P509373 ◆ 32 ◆ 2. qi2-bi2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma\n",
"= : AbB-primary ◆ P509373 ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim]\n",
"= : AbB-primary ◆ P509373 ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka\n",
"= : AbB-primary ◆ P509373 ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim]\n",
"= : AbB-primary ◆ P509373 ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_\n",
"= : AbB-primary ◆ P509373 ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki}\n",
"= : AbB-primary ◆ P509373 ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum\n",
"= : AbB-primary ◆ P509373 ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma\n",
"= : AbB-primary ◆ P509373 ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...]\n",
"= : AbB-primary ◆ P509373 ◆ 45 ◆ 15. x x x x x x [...]\n",
"= : AbB-primary ◆ P509373 ◆ 46 ◆ $ rest broken\n",
"= : AbB-primary ◆ P509373 ◆ 48 ◆ $ beginning broken\n",
"= : AbB-primary ◆ P509373 ◆ 49 ◆ 1'. [x x] x x [...]\n",
"= : AbB-primary ◆ P509373 ◆ 50 ◆ 2'. [x x] x [...]\n",
"= : AbB-primary ◆ P509373 ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_\n",
"= and 27367 more\n",
"Number of results: TF 27387; GREP 27387\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('contents',),\n",
" grepLineContents,\n",
" tfLineContents,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## By the method `T.text()`\n",
"\n",
"We apply the `T.text()` method on each line, using the default text format `text-orig-full`.\n",
"The method will walk over all signs on the line, and represent each sign by means of the feature `atf` plus\n",
"some auxiliary features such as\n",
"\n",
"* `atfpre` and `atfpost` (for cluster characters preceding and following the sign reading),\n",
"* `after` (for separator characters after the sign: `-`, `:`, `/`, ` `, or the empty string)\n",
"\n",
"We only compare lines containing transcribed material: numbered lines in the source.\n",
"\n",
"### Workarounds\n",
"\n",
"In rare cases some clusters start or end with a space or a hyphen, where the input had rather been encoded with that\n",
"space of hyphen just outside the cluster.\n",
"\n",
"We work around them, and we check whether we have encountered all listed work-arounds."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:56.827129Z",
"start_time": "2018-03-06T06:42:56.796606Z"
}
},
"outputs": [],
"source": [
"def tfLineText():\n",
" lines = []\n",
" for document in F.otype.s('document'):\n",
" documentName = F.pnumber.v(document)\n",
" srcfile = F.srcfile.v(document)\n",
" for line in L.d(document, otype='line'):\n",
" if F.lnc.v(line):\n",
" continue\n",
" face = F.face.v(L.u(line, otype='face')[0])\n",
" srcLnNum = F.srcLnNum.v(line)\n",
" srcLn = F.srcLn.v(line)\n",
" primeLn = prime if F.primeln.v(line) else ''\n",
" ln = F.ln.v(line)\n",
" text = T.text(line)\n",
" lines.append((srcfile, documentName, face, srcLnNum, f'{ln}{primeLn}. {text}'))\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [],
"source": [
"def methodB1(x):\n",
" return x.replace('_-', '-_')\n",
"\n",
"def methodB2(x):\n",
" return x.replace('[-', '-[')\n",
"\n",
"def methodE1(x):\n",
" return x.replace('-]', ']-')\n",
"\n",
"workarounds = {\n",
" ('P313391', 'reverse', '5'): methodB1,\n",
" ('P312032', 'reverse', '12'): methodB2,\n",
" ('P345563', 'obverse', '4'): methodE1,\n",
" ('P305773', 'reverse', '1'): methodE1,\n",
"}\n",
"\n",
"workaroundsApplied = set()\n",
"\n",
"def initWorkarounds():\n",
" workaroundsApplied.clear()\n",
"\n",
"def checkWorkarounds(document, face, ln, srcLn):\n",
" if (document, face, ln) in workarounds:\n",
" workaroundsApplied.add((document, face, ln))\n",
" method = workarounds[(document, face, ln)]\n",
" srcLn = method(srcLn)\n",
" print(f'workaround applied: \"{srcLn}\"')\n",
" return srcLn\n",
"\n",
"def finishWorkarounds():\n",
" if set(workarounds) == workaroundsApplied:\n",
" print(f'ALL {len(workarounds)} WORKAROUNDS APPLIED')\n",
" else:\n",
" print('UNAPPLIED WORKAROUNDS:')\n",
" for (document, face, ln) in sorted(set(workarounds) - workaroundsApplied):\n",
" print(f'\\t{document} {face}:{ln}')"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:42:57.821096Z",
"start_time": "2018-03-06T06:42:57.788963Z"
}
},
"outputs": [],
"source": [
"def grepLineText(gen):\n",
" lines = []\n",
" initWorkarounds()\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" ln = match.group(1)\n",
" srcLn = trimRe.sub(' ', srcLn)\n",
" srcLn = checkWorkarounds(document, face, ln, srcLn)\n",
" lines.append((srcfile, document, face, srcLnNum, srcLn))\n",
" finishWorkarounds()\n",
" return lines"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:43:07.490571Z",
"start_time": "2018-03-06T06:43:06.931814Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"5. 1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"4. ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"1. [1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"12. _iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ contents\n",
"IDENTICAL: all 26406 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ 2. qi2-bi2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ 15. x x x x x x [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ 1'. [x x] x x [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ 2'. [x x] x [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ 4'. [x x] x x x-ir ub-lam\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ 5'. in-na-me-er-ma\n",
"= and 26386 more\n",
"Number of results: TF 26406; GREP 26406\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('contents',),\n",
" grepLineText,\n",
" tfLineText,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clusters\n",
"\n",
"Clusters are groupings of signs.\n",
"The transcription uses a variety of brackets for several kinds of clustering.\n",
"Clusters may be nested.\n",
"Clusters of different types need not be properly nested.\n",
"\n",
"Usually, clusters do not start with an inter-word space or an inter-sign hyphen.\n",
"But if they do, we work around them by pushing the offending space or hyphen out of the cluster.\n",
"\n",
"See **Workarounds** above.\n",
"\n",
"See the\n",
"[ORACC ATF docs](http://oracc.museum.upenn.edu/doc/help/editinginatf/quickreference/index.html)\n",
"\n",
"Most clusters are trivial: `[...]`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cluster types\n",
"\n",
"We count how much clusters we have of each type."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"langalt 7600 x\n",
"missing 7572 x\n",
"det 6794 x\n",
"uncertain 1183 x\n",
"supplied 231 x\n",
"excised 69 x\n"
]
}
],
"source": [
"for (typ, amount) in F.type.freqList('cluster'):\n",
" print(f'{typ:<15} {amount:>5} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Alternate language\n",
"\n",
"We count how much material is in the alternate language (*Sumerian*) and how much in the main language\n",
"(*Akkadian*)."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"akk 173823 signs\n",
"sux 19016 signs\n"
]
}
],
"source": [
"lang = collections.Counter()\n",
"\n",
"altLang = dict(\n",
" sux='akk',\n",
" akk='sux',\n",
")\n",
"\n",
"skipTypes = {'empty', 'comment', 'ellipsis', 'unknown'}\n",
"for d in F.otype.s('document'):\n",
" docLang = F.lang.v(d)\n",
" for s in L.d(d, otype='sign'):\n",
" typ = F.type.v(s)\n",
" if typ in skipTypes:\n",
" continue\n",
" signLang = altLang[docLang] if F.langalt.v(s) else docLang \n",
" lang[signLang] += 1\n",
" \n",
"for (l, amount) in sorted(\n",
" lang.items(),\n",
" key=lambda x: (-x[1], x[0]),\n",
"):\n",
" print(f'{l} {amount:>6} signs')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Check by ATF\n",
"\n",
"Now we check for each cluster whether the ATF of its material as delivered by TF is equal to the material that\n",
"we get \"directly\" by GREPping.\n",
"\n",
"Note however, that in order to GREP the clusters correctly, we have to do similar manipulations as what we did\n",
"to generate the TF.\n",
"\n",
"Clusters are not directly greppable, because:\n",
"\n",
"* the cluster characters may coincide with other usages of the same character: `( )` occurs in non-cluster constructs like `rrr!(YYY)`, `rrrx(YYY)`, `333(rrr)`\n",
"* the begin and end boundaries may be coded by the same character: `_ _`\n",
"* clusters of one type may use boundary characters that also are used by clusters of another type: `< >` and `<< >>`.\n",
"\n",
"So we proceed by escaping all cluster characters first to fresh characters that have none of these problems."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:48:56.701799Z",
"start_time": "2018-03-06T06:48:56.677824Z"
}
},
"outputs": [],
"source": [
"def tfClusters():\n",
" clusters = []\n",
" for l in F.otype.s('line'):\n",
" lineClusters = []\n",
" for c in L.d(l, 'cluster'):\n",
" lineClusters.append((F.type.v(c), T.text(c)))\n",
" if lineClusters:\n",
" (document, face, line) = T.sectionFromNode(l)\n",
" srcfile = F.srcfile.v(l)\n",
" srcLnNum = F.srcLnNum.v(l)\n",
" lineClusters = [(srcfile, document, face, srcLnNum, typ, f'\"{atf}\"') for (typ, atf) in sorted(lineClusters)]\n",
" clusters.extend(lineClusters)\n",
" return clusters"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:48:58.646981Z",
"start_time": "2018-03-06T06:48:58.496460Z"
}
},
"outputs": [],
"source": [
"inlineCommentRe = re.compile(r'''\\s*\\(\\$.*?\\$\\)\\s*''')\n",
"noClusterRe = re.compile(r'''([0-9nx!])\\(([A-Za-z0-9,/'#!?*+|.]+)\\)''')\n",
"\n",
"bChars = f'''[{clusterBstr}]*'''\n",
"eChars = f'''[{clusterEstr}#?!+*]*[ -]*'''\n",
"\n",
"def noClusterRepl(match):\n",
" return f'{match.group(1)}§§{match.group(2)}±±'\n",
"\n",
"def noClusterRemove(text):\n",
" return text.replace('§§', '(').replace('±±', ')')\n",
"\n",
"def makeClusterEscRepl(cab, cae):\n",
" def repl(match):\n",
" return f'{cab}{match.group(2)}{cae}'\n",
" return repl\n",
"\n",
"clusterEscRe = {}\n",
"clusterFindRe = {}\n",
"clusterEscRepl = {}\n",
"\n",
"for (cab, cae, cob, coe, ctp) in clusterChars:\n",
" if cob == coe:\n",
" clusterEscRe[cab] = re.compile(f'''({re.escape(cob)}(.*?){re.escape(coe)})''')\n",
" clusterEscRepl[cab] = makeClusterEscRepl(cab, cae)\n",
" clusterFindRe[cab] = re.compile(f'''{bChars}{re.escape(cab)}.+?{re.escape(cae)}{eChars}''')\n",
"\n",
"def clusterEsc(text):\n",
" text = noClusterRe.sub(noClusterRepl, text)\n",
" for (cab, cae, cob, coe, ctp) in clusterChars:\n",
" if cob == coe:\n",
" text = clusterEscRe[cab].sub(clusterEscRepl[cab], text)\n",
" else:\n",
" text = text.replace(cob, cab).replace(coe, cae)\n",
" return text\n",
"\n",
"def clusterUnesc(text):\n",
" for (cab, cae, cob, coe, ctp) in clusterChars:\n",
" text = text.replace(cab, cob).replace(cae, coe)\n",
" text = noClusterRemove(text)\n",
" return text\n",
"\n",
"def grepClusters(gen):\n",
" clusters = []\n",
" initWorkarounds()\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" ln = match.group(1)\n",
" srcLn = match.group(2)\n",
" srcLn = checkWorkarounds(document, face, ln, srcLn)\n",
" lineClusters = []\n",
" srcLn = inlineCommentRe.sub('', srcLn)\n",
" srcLn = trimRe.sub(' ', srcLn)\n",
" srcLn = clusterEsc(srcLn)\n",
" for (cab, cae, cob, coe, ctp) in clusterChars:\n",
" css = clusterFindRe[cab].findall(srcLn)\n",
" for cs in css:\n",
" lineClusters.append((ctp, clusterUnesc(cs)))\n",
" lineClusters = [(srcfile, document, face, srcLnNum, c, f'\"{cs}\"') for (c, cs) in sorted(lineClusters)]\n",
" clusters.extend(lineClusters)\n",
" finishWorkarounds()\n",
"\n",
" return clusters"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:49:02.886207Z",
"start_time": "2018-03-06T06:49:00.548629Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ type ◆ cluster\n",
"IDENTICAL: all 23449 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ det ◆ \"_{d}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ langalt ◆ \"_{d}suen_-\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ \"[a-na] \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ \"[din-nam]\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ missing ◆ \"[ma]\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ det ◆ \"_{d}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ langalt ◆ \"_{d}en-lil2_-\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ \"_{d}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ \"_{d}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ \"_{d}[marduk]_ \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ \"_{d}utu_ \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ \"[marduk]_ \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ \"[tim]\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ \"_{d}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ \"{disz}\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ langalt ◆ \"_{d}suen a2-gal2 [dumu] um-mi-a-mesz_\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ missing ◆ \"[dumu] \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ \"[ma]\"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ \"[ma] \"\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ det ◆ \"{disz}\"\n",
"= and 23429 more\n",
"Number of results: TF 23449; GREP 23449\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('type', 'cluster',),\n",
" grepClusters,\n",
" tfClusters,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check by cluster type feature\n",
"\n",
"Every type of cluster corresponds to a sign feature of the same name that has value 1 for each sign\n",
"that occurs in a cluster of that type.\n",
"\n",
"Per cluster type, we check whether the list of signs inside a cluster corresponds with the signs\n",
"that have the cluster type feature set to 1."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"def clusterAtf(signs):\n",
" atf = ''\n",
" for (i, s) in enumerate(signs):\n",
" atf += F.atfpre.v(s) or ''\n",
" atf += F.atf.v(s)\n",
" atf += F.atfpost.v(s) or ''\n",
" atf += F.after.v(s) or ''\n",
" return atf\n",
"\n",
"def checkClusterType(cType, cB, cE):\n",
" excluded = {'empty', 'comment'}\n",
" \n",
" def getCluster(sign):\n",
" if sign is None:\n",
" return None\n",
" clusters = L.u(sign, otype='cluster')\n",
" cTarget = [cluster for cluster in clusters if F.type.v(cluster) == cType]\n",
" return cTarget[0] if cTarget else None\n",
"\n",
" def tfClustersType():\n",
" clusters = []\n",
" for l in F.otype.s('line'):\n",
" if F.comment.v(l):\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(l)\n",
" srcfile = F.srcfile.v(l)\n",
" srcLnNum = F.srcLnNum.v(l)\n",
" prevS = None\n",
" curCluster = []\n",
" for s in L.d(l, otype='sign'):\n",
" sType = F.type.v(s)\n",
" if sType in excluded:\n",
" continue\n",
" isIn = Fs(cType).v(s)\n",
" thisC = getCluster(s)\n",
" prevC = getCluster(prevS)\n",
" if thisC != prevC:\n",
" if curCluster:\n",
" clusters.append((srcfile, document, face, srcLnNum, clusterAtf(curCluster)))\n",
" curCluster = []\n",
" if isIn:\n",
" curCluster.append(s)\n",
" prevS = s\n",
" if curCluster:\n",
" clusters.append((srcfile, document, face, srcLnNum, clusterAtf(curCluster)))\n",
" curCluster = []\n",
" return clusters\n",
" \n",
" def grepClustersType(gen):\n",
" clusters = []\n",
" initWorkarounds()\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" ln = match.group(1)\n",
" srcLn = match.group(2)\n",
" srcLn = checkWorkarounds(document, face, ln, srcLn)\n",
" lineClusters = []\n",
" srcLn = inlineCommentRe.sub('', srcLn)\n",
" srcLn = trimRe.sub(' ', srcLn)\n",
" srcLn = clusterEsc(srcLn)\n",
" (cab, cae, cob, coe) = clusterTypeInfo[cType]\n",
" css = clusterFindRe[cab].findall(srcLn)\n",
" for cs in css:\n",
" clusters.append((srcfile, document, face, srcLnNum, clusterUnesc(cs)))\n",
" finishWorkarounds()\n",
" return clusters\n",
" \n",
" COMP.checkSanity(\n",
" ('cluster',),\n",
" grepClustersType,\n",
" tfClustersType,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `langalt _ _`"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 7600 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_-\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_-\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d}suen a2-gal2 [dumu] um-mi-a-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _a-sza3_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _sza3-gud_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _a-sza3 a-gar3_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _uru_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d}utu_-\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _mu 7(disz) kam_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ _uru_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _a-sza3 szuku_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _nagar-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ _a-sza3 a-gar3 uru_ \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ _gir3-se3#-ga#_\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ _[a-sza3_ \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _a-[sza3 a-gar3_ \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _uru gan2_ \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ _a-sza3_ \n",
"= and 7580 more\n",
"Number of results: TF 7600; GREP 7600\n"
]
}
],
"source": [
"checkClusterType('langalt', '_', '_')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `missing [ ]`"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 7572 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na] \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [din-nam]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ [ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [marduk]_ \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [tim]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu] \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma] \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ [tim]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [bi]-\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [ku]-\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ [ma-lum] \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ [ir] \n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ [...]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [x x] \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [x x] \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [...]\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ [x x] \n",
"= and 7552 more\n",
"Number of results: TF 7572; GREP 7572\n"
]
}
],
"source": [
"checkClusterType('missing', '[', ']')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `det { }`"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 6794 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ {disz}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ {ki}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d}\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ {ki} \n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ {ki}\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ _{d}\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d}\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d}\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {disz}\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {d}\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {ki} \n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 208 ◆ {d}\n",
"= : AbB-primary ◆ P509376 ◆ reverse ◆ 220 ◆ {disz}\n",
"= and 6774 more\n",
"Number of results: TF 6794; GREP 6794\n"
]
}
],
"source": [
"checkClusterType('det', '{', '}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `uncertain ( )`"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 1183 items\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 460 ◆ (x)] \n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 466 ◆ [(x)]\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 469 ◆ (x)] \n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)] \n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ [(x) \n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)]\n",
"= : AbB-primary ◆ P510529 ◆ reverse ◆ 7772 ◆ [(x)]\n",
"= : AbB-primary ◆ P510530 ◆ obverse ◆ 7821 ◆ [(x)]\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7845 ◆ (x)] \n",
"= : AbB-primary ◆ P510531 ◆ obverse ◆ 7896 ◆ (x)] \n",
"= : AbB-primary ◆ P510531 ◆ reverse ◆ 7898 ◆ (x)]-\n",
"= : AbB-primary ◆ P510531 ◆ reverse ◆ 7901 ◆ (x)] \n",
"= : AbB-primary ◆ P510531 ◆ reverse ◆ 7902 ◆ (x)] \n",
"= : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ [(x) \n",
"= : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ (x)]\n",
"= : AbB-primary ◆ P510534 ◆ reverse ◆ 8055 ◆ [(x)]\n",
"= : AbB-primary ◆ P510534 ◆ reverse ◆ 8067 ◆ (x)]-\n",
"= : AbB-primary ◆ P510536 ◆ obverse ◆ 8165 ◆ [(x)] \n",
"= : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x)] \n",
"= : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x) \n",
"= and 1163 more\n",
"Number of results: TF 1183; GREP 1183\n"
]
}
],
"source": [
"checkClusterType('uncertain', '(', ')')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `supplied < >`"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 231 items\n",
"= : AbB-primary ◆ P389958 ◆ reverse ◆ 523 ◆ <ru>-\n",
"= : AbB-primary ◆ P510526 ◆ obverse ◆ 7604 ◆ <li-ki-il>\n",
"= : AbB-primary ◆ P510551 ◆ obverse ◆ 8942 ◆ <ti>-\n",
"= : AbB-primary ◆ P510552 ◆ obverse ◆ 8992 ◆ <li>-\n",
"= : AbB-primary ◆ P510552 ◆ obverse ◆ 8993 ◆ <ma> \n",
"= : AbB-primary ◆ P510559 ◆ obverse ◆ 9402 ◆ <li>-\n",
"= : AbB-primary ◆ P510561 ◆ obverse ◆ 9503 ◆ <ra>-\n",
"= : AbB-primary ◆ P510562 ◆ obverse ◆ 9548 ◆ <ma?>-\n",
"= : AbB-primary ◆ P510571 ◆ reverse ◆ 10054 ◆ <ut>-\n",
"= : AbB-primary ◆ P510577 ◆ obverse ◆ 10396 ◆ <wi>-\n",
"= : AbB-primary ◆ P510583 ◆ obverse ◆ 10748 ◆ <isz> \n",
"= : AbB-primary ◆ P510588 ◆ obverse ◆ 11067 ◆ <wi>-\n",
"= : AbB-primary ◆ P510591 ◆ reverse ◆ 11292 ◆ <ta>-\n",
"= : AbB-primary ◆ P510592 ◆ reverse ◆ 11373 ◆ <ti> \n",
"= : AbB-primary ◆ P510599 ◆ obverse ◆ 11750 ◆ <li>-\n",
"= : AbB-primary ◆ P510606 ◆ obverse ◆ 12137 ◆ <t,u2>-\n",
"= : AbB-primary ◆ P510613 ◆ obverse ◆ 12534 ◆ <ma>\n",
"= : AbB-primary ◆ P510616 ◆ obverse ◆ 12719 ◆ <ta> \n",
"= : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ <lu> \n",
"= : AbB-primary ◆ P510617 ◆ reverse ◆ 12799 ◆ <li>-\n",
"= and 211 more\n",
"Number of results: TF 231; GREP 231\n"
]
}
],
"source": [
"checkClusterType('supplied', '<', '>')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `excised << >>`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ cluster\n",
"IDENTICAL: all 69 items\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ <<TE>>-\n",
"= : AbB-primary ◆ P510543 ◆ obverse ◆ 8537 ◆ <<li>>-\n",
"= : AbB-primary ◆ P510562 ◆ reverse ◆ 9563 ◆ <<KI>>\n",
"= : AbB-primary ◆ P510573 ◆ reverse ◆ 10149 ◆ <<an-na>> \n",
"= : AbB-primary ◆ P510576 ◆ reverse ◆ 10329 ◆ <<x>> \n",
"= : AbB-primary ◆ P510621 ◆ reverse ◆ 13006 ◆ <<ti>>\n",
"= : AbB-primary ◆ P497370 ◆ obverse ◆ 13101 ◆ <<mar>>-\n",
"= : AbB-primary ◆ P510634 ◆ obverse ◆ 13743 ◆ <<ma>>\n",
"= : AbB-primary ◆ P510660 ◆ reverse ◆ 15093 ◆ <<i-na>> \n",
"= : AbB-primary ◆ P510661 ◆ reverse ◆ 15147 ◆ <<kam iti>> \n",
"= : AbB-primary ◆ P510686 ◆ obverse ◆ 16380 ◆ <<qa2-be2-e>>\n",
"= : AbB-primary ◆ P510686 ◆ obverse ◆ 16383 ◆ <<bi>>-\n",
"= : AbB-primary ◆ P510688 ◆ obverse ◆ 16513 ◆ <<i>> \n",
"= : AbB-primary ◆ P510725 ◆ obverse ◆ 18485 ◆ <<um>>\n",
"= : AbB-primary ◆ P510775 ◆ obverse ◆ 21373 ◆ <<ti>>\n",
"= : AbB-primary ◆ P510798 ◆ obverse ◆ 22630 ◆ <<gur>>\n",
"= : AbB-primary ◆ P510807 ◆ obverse ◆ 23150 ◆ <<ID>>-\n",
"= : AbB-primary ◆ P510821 ◆ obverse ◆ 23894 ◆ <<u2-ul>> \n",
"= : AbB-primary ◆ P510861 ◆ reverse ◆ 26058 ◆ <<u2-sza#>>\n",
"= : AbB-primary ◆ P413589 ◆ reverse ◆ 27013 ◆ <<sza-li-im>>\n",
"= and 49 more\n",
"Number of results: TF 69; GREP 69\n"
]
}
],
"source": [
"checkClusterType('excised', '<<', '>>')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Primes\n",
"\n",
"Here is an overview of the occurrence of primes.\n",
"\n",
"There are primes within sign readings, they denote a numerical property.\n",
"\n",
"Primes on column and line numbers denote that the given number deviates from the physical number\n",
"because of damage.\n",
"\n",
"**N.B.:** This gathers primes on *signs*, *column* numbers and *case* numbers.\n",
"\n",
"First a bit of exploration."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:12.241409Z",
"start_time": "2018-03-06T06:46:12.221431Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"primecol: 4 x 1\n",
"primeln : 1825 x 1\n"
]
}
],
"source": [
"primeFt = ('primecol', 'primeln')\n",
"\n",
"for ft in primeFt:\n",
" for (value, frequency) in Fs(ft).freqList():\n",
" print(f'{ft:<8}: {frequency:>5} x {value}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also want so see the node types of primed entities."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:14.858089Z",
"start_time": "2018-03-06T06:46:14.823800Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"primecol: 4 x line\n",
"primeln : 1825 x line\n"
]
}
],
"source": [
"for ft in primeFt:\n",
" primed = collections.Counter()\n",
" for n in Fs(ft).s(1):\n",
" primed[F.otype.v(n)] += 1\n",
" for x in sorted(primed.items()):\n",
" print(f'{ft:<8}: {x[1]:>5} x {x[0]}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let us check the primes with grep, directly in the source files."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:16.591598Z",
"start_time": "2018-03-06T06:46:16.564247Z"
}
},
"outputs": [],
"source": [
"nonSignStuff = r'''()\\[\\]{}<>|.#!?+*'''\n",
"nonSignRe = re.compile(f'''[{nonSignStuff}]+''')\n",
"\n",
"def tfPrimes():\n",
" primes = []\n",
" for l in F.otype.s('line'):\n",
" (document, face, line) = T.sectionFromNode(l)\n",
" srcfile = F.srcfile.v(l)\n",
" srcln = F.srcLnNum.v(l)\n",
" primeln = F.primeln.v(l)\n",
" primecol = F.primecol.v(l)\n",
" if primecol and (not L.p(l, otype='line') or F.col.v(l) == F.col.v(L.p(l, otype='line')[0])):\n",
" primes.append((srcfile, document, face, srcln - 1, 'column', f'{F.col.v(l)}{prime}'))\n",
" if primeln:\n",
" primes.append((srcfile, document, face, srcln, 'line', f'{F.ln.v(l)}{prime}.'))\n",
" for s in L.d(l, otype='sign'):\n",
" reading = F.reading.v(s)\n",
" if reading:\n",
" if prime in reading:\n",
" rep = nonSignRe.sub('', F.atf.v(s))\n",
" primes.append((srcfile, document, face, srcln, 'sign', rep))\n",
" return primes"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:18.502639Z",
"start_time": "2018-03-06T06:46:18.457107Z"
}
},
"outputs": [],
"source": [
"material = f'''A-Za-z0-9,'/{nonSignStuff}'''\n",
"materialP = f'{material}{prime}'\n",
"primeRe = re.compile(f'''[{material}]*{prime}[{materialP}]*''')\n",
"\n",
"readingRe = re.compile(r'''!\\([^)]+\\)''')\n",
"\n",
"def grepPrimes(gen):\n",
" primes = []\n",
" prevColumn = None\n",
" for (src, document, face, column, srcln, line) in gen:\n",
" if column and column != prevColumn:\n",
" if \"'\" in column:\n",
" primes.append((src, document, face, srcln, 'column', column))\n",
" prevColumn = column\n",
" fields = line.split(maxsplit=1)\n",
" lineNum = fields[0]\n",
" if prime in lineNum:\n",
" primes.append((src, document, face, srcln, 'line', lineNum))\n",
" if len(fields) != 2:\n",
" continue\n",
" if lineNum.startswith('$') or lineNum.startswith('#'):\n",
" continue\n",
" trans = fields[1]\n",
" if prime in trans: \n",
" trans = readingRe.sub('', trans)\n",
" hits = primeRe.findall(trans)\n",
" for hit in hits:\n",
" hit = nonSignRe.sub('', hit)\n",
" primes.append((src, document, face, srcln, 'sign', hit))\n",
" return primes"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:19.572383Z",
"start_time": "2018-03-06T06:46:19.296832Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ prime\n",
"IDENTICAL: all 1865 items\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ line ◆ 1'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ line ◆ 2'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ line ◆ 3'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ line ◆ 4'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ line ◆ 5'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ line ◆ 6'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ line ◆ 7'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 56 ◆ line ◆ 8'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ line ◆ 9'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 58 ◆ line ◆ 10'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 59 ◆ line ◆ 11'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ line ◆ 12'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 61 ◆ line ◆ 13'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 62 ◆ line ◆ 14'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ line ◆ 15'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ line ◆ 16'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ line ◆ 17'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ line ◆ 18'.\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ line ◆ 19'.\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 448 ◆ line ◆ 1'.\n",
"= and 1845 more\n",
"Number of results: TF 1865; GREP 1865\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('kind', 'prime',),\n",
" grepPrimes,\n",
" tfPrimes,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Words\n",
"\n",
"Words are space separated parts of a transcription line (not counting inline comments).\n",
"\n",
"Words have very few features, currently only one: `atf`."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfWords():\n",
" words = []\n",
" for w in F.otype.s('word'):\n",
" (document, face, line) = T.sectionFromNode(w)\n",
" l = L.u(w, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" atf = F.atf.v(w)\n",
" if atf:\n",
" words.append((srcfile, document, face, srcln, atf))\n",
" return words"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"commentLineRe = re.compile(r'''^\\$\\s*(.*)''')\n",
"commentInlineRe = re.compile(r'''\\(\\$ (.*?) \\$\\)''')\n",
" \n",
"def grepWords(gen):\n",
" words = []\n",
" initWorkarounds()\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" ln = match.group(1)\n",
" srcLn = match.group(2)\n",
" srcLn = checkWorkarounds(document, face, ln, srcLn)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" for w in srcLn.split():\n",
" words.append((srcfile, document, face, srcLnNum, w))\n",
" finishWorkarounds()\n",
" return words"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"workaround applied: \"1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_\"\n",
"workaround applied: \"ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na\"\n",
"workaround applied: \"[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]\"\n",
"workaround applied: \"_iti gu4-si#-[sa2_ ...]\"\n",
"ALL 4 WORKAROUNDS APPLIED\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 76503 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_-i-[din-nam]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2-bi2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um-ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_-sza-du-u2-ni-ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ u3\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ a-na\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ da-ri-a-[tim]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ li-ba-al-li-t,u2-u2-ka\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz}sze-ep-_{d}suen\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ a2-gal2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ um-mi-a-mesz_\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ ki-a-am\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ u2-lam-mi-da-an-ni\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ um-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ szu-u2-[ma]\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz}sa-am-su-ba-ah-li\n",
"= and 76483 more\n",
"Number of results: TF 76503; GREP 76503\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepWords,\n",
" tfWords,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Flags\n",
"\n",
"We have several features for flags: \n",
"\n",
"mark | feature | comments\n",
"---- | --- | ---\n",
"`*`|*collation*\n",
"`#`|*damage*\n",
"`?`|*question*\n",
"`!`|*remarkable*\n",
"\n",
"### A bit of research\n",
"We start by surveying the possible values, including on which node types they occur"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:47.435936Z",
"start_time": "2018-03-06T06:46:47.418432Z"
}
},
"outputs": [],
"source": [
"flagMap = {\n",
" '#': 'damage',\n",
" '?': 'question',\n",
" '!': 'remarkable',\n",
" '*': 'collated',\n",
"}\n",
"\n",
"flagChars = list(flagMap.keys())\n",
"flagFeatures = list(flagMap.values())"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:46:49.657635Z",
"start_time": "2018-03-06T06:46:48.320435Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 9974 x sign-damage-1\n",
" 560 x sign-question-1\n",
" 99 x sign-remarkable-1\n",
" 13 x sign-collated-1\n"
]
}
],
"source": [
"flagNodeOverview = collections.Counter()\n",
"flagNodeTypes = set()\n",
"\n",
"for n in N():\n",
" for ft in flagFeatures:\n",
" value = Fs(ft).v(n)\n",
" if not value: continue\n",
" nType = F.otype.v(n)\n",
" flagNodeTypes.add(nType)\n",
" flagNodeOverview[f'{nType}-{ft}-{value}'] += 1\n",
"for (combi, amount) in sorted(flagNodeOverview.items(), key=lambda x: (-x[1], x[0])):\n",
" print(f'{amount:>6} x {combi}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us see whether there are any cooccurrences of flags."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:05.268840Z",
"start_time": "2018-03-06T06:47:03.891198Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"192721 x * - * - * - * \n",
" 9830 x damage - * - * - * \n",
" 421 x * - question - * - * \n",
" 138 x damage - question - * - * \n",
" 91 x * - * -remarkable- * \n",
" 9 x * - * - * - collated \n",
" 5 x damage - * -remarkable- * \n",
" 2 x * - * -remarkable- collated \n",
" 1 x * - question -remarkable- collated \n",
" 1 x damage - * - * - collated \n"
]
}
],
"source": [
"flagCombis = collections.Counter()\n",
"\n",
"for n in N():\n",
" if F.otype.v(n) not in flagNodeTypes:\n",
" continue\n",
" values = []\n",
" for ft in flagFeatures:\n",
" rawValue = Fs(ft).v(n)\n",
" value = f'{\"*\":^10}' if rawValue is None else f'{ft:^10}' if rawValue else f'{\"\":^10}'\n",
" values.append(value)\n",
"\n",
" combi = '-'.join(values)\n",
" flagCombis[combi] += 1\n",
"\n",
"for (combi, amount) in sorted(flagCombis.items(), key=lambda x: (-x[1], x[0])):\n",
" print(f'{amount:>6} x {combi}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to address the question about order of flags.\n",
"\n",
"A quick inspection in the corpus yields:\n",
"\n",
"* damage-question (`#?`) is frequent, question-damage (`?#`) is rare \n",
"* damage-remarkable (`#!`) in all cases\n",
"* remarkable-collated (`!*`) in all cases\n",
"* damage-collated (`#*`) in all cases\n",
"* question-remarkable-collated (`?!`) in all cases\n",
"\n",
"Based on this observation, and assuming that the order between *damage* and *question* is not relevant,\n",
"we produce flags always in the order:\n",
"\n",
"* *damage* *question* *remarkable* *collated*\n",
"\n",
"When grepping, we have to normalize `?#` to `#?`."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfFlags():\n",
" discrepancies = collections.Counter()\n",
" flags = []\n",
" for n in F.otype.s('sign'):\n",
" values = [Fs(ft).v(n) for ft in flagFeatures]\n",
" if all(value is None for value in values):\n",
" continue\n",
" fl = ''\n",
" for (i, val) in enumerate(values):\n",
" if val:\n",
" fl += flagChars[i]\n",
" checkFl = F.flags.v(n) or ''\n",
" if checkFl != fl:\n",
" msg = 'OK' if set(fl) == set(checkFl) else 'PROBLEM'\n",
" discrepancies[f'{fl} vs {checkFl} ({msg})'] += 1\n",
" \n",
" (document, face, line) = T.sectionFromNode(n)\n",
" l = L.u(n, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" opx = F.operator.v(n) == 'x'\n",
" num = F.type.v(n) == 'numeral'\n",
" reading = (\n",
" F.grapheme.v(n) or F.reading.v(n)\n",
" if opx else\n",
" F.reading.v(n) or F.grapheme.v(n)\n",
" )\n",
" br = ')' if num or opx else ''\n",
" flags.append((srcfile, document, face, srcln, f'{reading}{br}{checkFl}'))\n",
" \n",
" if not discrepancies:\n",
" print('NO DISCREPANCIES')\n",
" else:\n",
" for (d, amount) in sorted(\n",
" discrepancies.items(),\n",
" key=lambda x: (-x[1], x[0]),\n",
" ):\n",
" print(f'{d:<4} {amount:>3} x')\n",
" return flags"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"flagsRe = re.compile(r'''[A-Za-z0-9,'.]+\\)?[#*!?]+''')\n",
" \n",
"def grepFlags(gen):\n",
" flags = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = trimRe.sub(' ', srcLn)\n",
" srcLn = srcLn.replace('!(', '§§')\n",
" fls = flagsRe.findall(srcLn)\n",
" if match:\n",
" for f in fls:\n",
" flags.append((srcfile, document, face, srcLnNum, f.replace('§§', '!(')))\n",
" return flags"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#? vs ?# (OK) 7 x\n",
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 10498 items\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ se3#\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ ga#\n",
"= : AbB-primary ◆ P509375 ◆ obverse ◆ 151 ◆ na#\n",
"= : AbB-primary ◆ P509375 ◆ obverse ◆ 152 ◆ bi2#\n",
"= : AbB-primary ◆ P509375 ◆ reverse ◆ 166 ◆ il#\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ am#\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 257 ◆ ia#\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 258 ◆ ma#\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ ak#\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ kum#\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 269 ◆ ak#\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ta#\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ na#\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 279 ◆ mu#\n",
"= : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ nu#\n",
"= : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ ur2#\n",
"= : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ din#\n",
"= : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ nam#\n",
"= : AbB-primary ◆ P481191 ◆ reverse ◆ 410 ◆ szu#\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 450 ◆ ka#\n",
"= and 10478 more\n",
"Number of results: TF 10498; GREP 10498\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepFlags,\n",
" tfFlags,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Signs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have arrived at the level of signs.\n",
"\n",
"We will compare them, and all the structure we see in and around them, such as readings, graphemes, numerals,\n",
"operators and flags.\n",
"\n",
"First we have a glance at what happens between the signs, though."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## After the signs\n",
"\n",
"There might be material between a sign and the next one (if any).\n",
"\n",
"The most usual ones are the `-`, separating signs within words and ` ` separating words.\n",
"\n",
"Here is the complete overview."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"- 118903 x\n",
" 100198 x\n",
"/ 15 x\n",
". 5 x\n",
"+ 2 x\n",
": 1 x\n"
]
}
],
"source": [
"for (c, amount) in F.after.freqList():\n",
" print(f'{c} {amount:>6} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now an overview of the *types* of signs."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"reading 188292 x\n",
"unknown 8761 x\n",
"numeral 2184 x\n",
"ellipsis 1617 x\n",
"grapheme 1272 x\n",
"commentline 969 x\n",
"complex 122 x\n",
"comment 2 x\n"
]
}
],
"source": [
"signTypes = collections.Counter()\n",
"\n",
"for s in F.otype.s('sign'):\n",
" signTypes[F.type.v(s)] += 1\n",
" \n",
"for (t, amount) in sorted(\n",
" signTypes.items(),\n",
" key=lambda x: (-x[1], x[0]),\n",
"):\n",
" print(f'{t:<15} {amount:>6} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We check these types individually, from the least frequent to the most frequent.\n",
"\n",
"## Comment signs\n",
"\n",
"These are inline comments of the form `($` *text* `$)`."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsComment():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ != 'comment':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" comment = F.comment.v(s)\n",
" signs.append((srcfile, document, face, srcln, comment))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"def grepSignsComment(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" cms = commentInlineRe.findall(srcLn)\n",
" for c in cms:\n",
" signs.append((srcfile, document, face, srcLnNum, c))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 2 items\n",
"= : AbB-secondary ◆ P275088 ◆ reverse ◆ 23687 ◆ blank space\n",
"= : AbB-secondary ◆ P275104 ◆ reverse ◆ 24524 ◆ blank space\n",
"= no more items\n",
"Number of results: TF 2; GREP 2\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSignsComment,\n",
" tfSignsComment,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Complex Signs\n",
"\n",
"We check whether all complex signs have come through exactly right.\n",
"\n",
"These are the signs of the form `x(ZZZ)` and `!(ZZZ)`\n",
"\n",
"The characters `x` and `!` are called the *operators* in these complexes.\n",
"\n",
"Here is the distribution of operators."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"! 117 x\n",
"x 5 x\n"
]
}
],
"source": [
"for (c, amount) in F.operator.freqList():\n",
" print(f'{c} {amount:>6} x')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do two checks: an easy check involving the `atf` feature of a sign and a more involved check using the\n",
"`operator`, `reading`, and `grapheme` features of a sign."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Based on atf"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfComplexes():\n",
" complexes = []\n",
" for s in F.otype.s('sign'):\n",
" if F.type.v(s) != 'complex':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" atf = F.atf.v(s)\n",
" complexes.append((srcfile, document, face, srcln, atf))\n",
" return complexes"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"complexRe = re.compile(\n",
" f'''[a-z][a-z,0-9']*[#!?*]*'''\n",
" r'[!x]\\([^)]+\\)[#!?*]*'\n",
")\n",
" \n",
"def grepComplexes(gen):\n",
" complexes = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" cls = complexRe.findall(srcLn)\n",
" for c in cls:\n",
" complexes.append((srcfile, document, face, srcLnNum, c))\n",
" return complexes"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ complex\n",
"IDENTICAL: all 122 items\n",
"= : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU)\n",
"= : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM)\n",
"= : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM)\n",
"= : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM)\n",
"= : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU)\n",
"= : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI)\n",
"= : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM)\n",
"= : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB)\n",
"= : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI)\n",
"= : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA)\n",
"= : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM)\n",
"= : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM)\n",
"= : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU)\n",
"= : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM)\n",
"= : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM)\n",
"= : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA)\n",
"= : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE)\n",
"= : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL)\n",
"= : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR)\n",
"= : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|)\n",
"= and 102 more\n",
"Number of results: TF 122; GREP 122\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('complex',),\n",
" grepComplexes,\n",
" tfComplexes,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Based on other features"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfComplexes2():\n",
" complexes = []\n",
" for s in F.otype.s('sign'):\n",
" if F.type.v(s) != 'complex':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" values = [Fs(ft).v(s) for ft in flagFeatures]\n",
" fl = F.flags.v(s) or ''\n",
" op = F.operator.v(s)\n",
" atf = (\n",
" f'{F.reading.v(s)}{fl}{op}({F.grapheme.v(s)})'\n",
" if op == '!' else\n",
" f'{F.reading.v(s)}{op}({F.grapheme.v(s)}){fl}'\n",
" )\n",
" complexes.append((srcfile, document, face, srcln, atf))\n",
" return complexes"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ numeral\n",
"IDENTICAL: all 122 items\n",
"= : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU)\n",
"= : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM)\n",
"= : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM)\n",
"= : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM)\n",
"= : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU)\n",
"= : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI)\n",
"= : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM)\n",
"= : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB)\n",
"= : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI)\n",
"= : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA)\n",
"= : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM)\n",
"= : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM)\n",
"= : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU)\n",
"= : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM)\n",
"= : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM)\n",
"= : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA)\n",
"= : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE)\n",
"= : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL)\n",
"= : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR)\n",
"= : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|)\n",
"= and 102 more\n",
"Number of results: TF 122; GREP 122\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('numeral',),\n",
" grepComplexes,\n",
" tfComplexes2,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Commentline signs\n",
"\n",
"Commentline signs are artificial signs introduced on comment lines.\n",
"Comment lines have no transcribed material, but they annotate the structure (`$`) or the\n",
"line contents (`#`) of other lines.\n",
"\n",
"In order to anchor these comments to the text sequence, we have made extra signs for these lines.\n",
"For each comment line, there is one such sign, and it has type `commentline`.\n",
"The comments of these lines are stored in the `comment` feature on those signs."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsEmpty():\n",
" comments = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ != 'commentline':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" s = L.d(l, otype='sign')\n",
" if not s or F.type.v(s[0]) != 'commentline':\n",
" continue\n",
" comment = F.comment.v(s[0])\n",
" ln = F.lnc.v(l)\n",
" comments.append((srcfile, document, face, srcln, ln, comment))\n",
" return comments"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"def grepSignsEmpty(gen):\n",
" comments = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = commentLineRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" cms = match.group(1)\n",
" ln = srcLn[0]\n",
" comments.append((srcfile, document, face, srcLnNum, ln, cms))\n",
" return comments"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ comment\n",
"IDENTICAL: all 969 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 46 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 48 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P509375 ◆ obverse ◆ 154 ◆ $ ◆ rest missing\n",
"= : AbB-primary ◆ P507628 ◆ obverse ◆ 319 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ P507628 ◆ reverse ◆ 321 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 447 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 462 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 464 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 480 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P389958 ◆ obverse ◆ 512 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P389256 ◆ obverse ◆ 556 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P389256 ◆ obverse ◆ 562 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P389256 ◆ reverse ◆ 564 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P389256 ◆ reverse ◆ 568 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P510534 ◆ obverse ◆ 8049 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P510534 ◆ reverse ◆ 8051 ◆ $ ◆ beginning broken\n",
"= : AbB-primary ◆ P510536 ◆ reverse ◆ 8180 ◆ $ ◆ single ruling\n",
"= : AbB-primary ◆ P510537 ◆ reverse ◆ 8222 ◆ $ ◆ blank space\n",
"= : AbB-primary ◆ P510541 ◆ reverse ◆ 8441 ◆ $ ◆ rest broken\n",
"= : AbB-primary ◆ P510543 ◆ reverse ◆ 8554 ◆ $ ◆ single ruling\n",
"= and 949 more\n",
"Number of results: TF 969; GREP 969\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('kind', 'comment'),\n",
" grepSignsEmpty,\n",
" tfSignsEmpty,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Grapheme signs\n",
"\n",
"These are signs that do not contain a *reading* (lower case name of a transcribed unit)\n",
"but a *grapheme* (upper case name of a transcribed unit).\n",
"\n",
"Complex signs that have a grapheme in their `x(GGG)` or `!(GGG)` parts are not included."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsGrapheme():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ != 'grapheme':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" d = F.grapheme.v(s)\n",
" signs.append((srcfile, document, face, srcln, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"graphemeRe = re.compile(r'''[A-WYZ][A-WYZ,0-9]*''')\n",
"excludeRe = re.compile(r'''[x!]\\([^)]+\\)''')\n",
"\n",
"def grepSignsGrapheme(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" srcLn = excludeRe.sub('', srcLn)\n",
" data = graphemeRe.findall(srcLn)\n",
" for d in data:\n",
" signs.append((srcfile, document, face, srcLnNum, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 1272 items\n",
"= : AbB-primary ◆ P481191 ◆ seal 1 ◆ 415 ◆ ARAD\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ AD\n",
"= : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ DA\n",
"= : AbB-primary ◆ P389958 ◆ obverse ◆ 518 ◆ DA\n",
"= : AbB-primary ◆ P510527 ◆ reverse ◆ 7673 ◆ SZESZ\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7833 ◆ ARAD\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ TE\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7839 ◆ GAN2\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7844 ◆ ARAD\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7847 ◆ ARAD\n",
"= : AbB-primary ◆ P510530 ◆ reverse ◆ 7848 ◆ ARAD\n",
"= : AbB-primary ◆ P510534 ◆ reverse ◆ 8054 ◆ ARAD\n",
"= : AbB-primary ◆ P510536 ◆ obverse ◆ 8163 ◆ ARAD\n",
"= : AbB-primary ◆ P510536 ◆ obverse ◆ 8168 ◆ ARAD\n",
"= : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ SU\n",
"= : AbB-primary ◆ P510537 ◆ obverse ◆ 8220 ◆ SU\n",
"= : AbB-primary ◆ P510541 ◆ obverse ◆ 8407 ◆ GAN2\n",
"= : AbB-primary ◆ P510541 ◆ obverse ◆ 8412 ◆ GAN2\n",
"= : AbB-primary ◆ P510541 ◆ obverse ◆ 8416 ◆ GAN2\n",
"= : AbB-primary ◆ P510541 ◆ reverse ◆ 8425 ◆ GAN2\n",
"= and 1252 more\n",
"Number of results: TF 1272; GREP 1272\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSignsGrapheme,\n",
" tfSignsGrapheme,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ellipsis signs\n",
"\n",
"These are signs that are represented as `...`."
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsEllipsis():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ != 'ellipsis':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" d = F.grapheme.v(s)\n",
" signs.append((srcfile, document, face, srcln, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"ellipsisRe = re.compile(r'''\\.\\.\\.''')\n",
"\n",
"def grepSignsEllipsis(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" data = ellipsisRe.findall(srcLn)\n",
" for d in data:\n",
" signs.append((srcfile, document, face, srcLnNum, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 1617 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ ...\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ ...\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 102 ◆ ...\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 105 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 254 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 255 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 268 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 270 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 278 ◆ ...\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 280 ◆ ...\n",
"= : AbB-primary ◆ P481191 ◆ seal 1 ◆ 413 ◆ ...\n",
"= and 1597 more\n",
"Number of results: TF 1617; GREP 1617\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSignsEllipsis,\n",
" tfSignsEllipsis,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Numerals\n",
"\n",
"We check whether all numerals have come through exactly right.\n",
"\n",
"We do two checks: an easy check involving the `atf` feature of a sign and a more involved check using the\n",
"`repeat`, `fraction` and `reading` features of a sign."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Based on atf"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfNumerals():\n",
" numerals = []\n",
" for s in F.otype.s('sign'):\n",
" if F.type.v(s) != 'numeral':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" atf = F.atf.v(s).rstrip(flaggingStr)\n",
" numerals.append((srcfile, document, face, srcln, atf))\n",
" return numerals"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"numeralRe = re.compile(\n",
" f'((?:n|(?:[0-9/]+))'\n",
" r'\\([^)]+\\))'\n",
")\n",
" \n",
"def grepNumerals(gen):\n",
" numerals = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" nls = numeralRe.findall(srcLn)\n",
" for n in nls:\n",
" numerals.append((srcfile, document, face, srcLnNum, n))\n",
" return numerals"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ numeral\n",
"IDENTICAL: all 2184 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3)\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz)\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz)\n",
"= : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz)\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz)\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz)\n",
"= : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u)\n",
"= : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u)\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz)\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz)\n",
"= : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz)\n",
"= : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz)\n",
"= : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz)\n",
"= and 2164 more\n",
"Number of results: TF 2184; GREP 2184\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('numeral',),\n",
" grepNumerals,\n",
" tfNumerals,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Based on other features"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfNumerals2():\n",
" numerals = []\n",
" for s in F.otype.s('sign'):\n",
" if F.type.v(s) != 'numeral':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" repeat = F.repeat.v(s)\n",
" if repeat == -1:\n",
" repeat = 'n'\n",
" atf = f'{repeat or F.fraction.v(s)}({F.reading.v(s)})'\n",
" numerals.append((srcfile, document, face, srcln, atf))\n",
" return numerals"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ numeral\n",
"IDENTICAL: all 2184 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3)\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz)\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz)\n",
"= : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz)\n",
"= : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz)\n",
"= : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u)\n",
"= : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz)\n",
"= : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u)\n",
"= : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u)\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz)\n",
"= : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz)\n",
"= : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz)\n",
"= : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz)\n",
"= : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz)\n",
"= and 2164 more\n",
"Number of results: TF 2184; GREP 2184\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('numeral',),\n",
" grepNumerals,\n",
" tfNumerals2,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unknown signs\n",
"\n",
"These are not unknown signs but signs that represent unknown readings/graphemes.\n",
"\n",
"They are represented as `x` or `X`, `n` or `N`."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsUnknown():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ != 'unknown':\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" d = F.reading.v(s) or F.grapheme.v(s)\n",
" signs.append((srcfile, document, face, srcln, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"unknownRe = re.compile(r'''([xX])|(?:(?:_|\\b)([nN])(?:_|\\b)(?!\\())''')\n",
"\n",
"def grepSignsUnknown(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" srcLn = excludeRe.sub('', srcLn)\n",
" data = unknownRe.findall(srcLn)\n",
" for result in data:\n",
" d = result[0] or result[1]\n",
" signs.append((srcfile, document, face, srcLnNum, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 8761 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x\n",
"= : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x\n",
"= and 8741 more\n",
"Number of results: TF 8761; GREP 8761\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSignsUnknown,\n",
" tfSignsUnknown,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading signs\n",
"\n",
"These are signs that contain a *reading* (lower case name of a transcribed unit).\n",
"\n",
"We also include the readings of complex signs that also have a grapheme in their representations:\n",
"`rrrx(GGG)` or `rrr!(GGG)`"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSignsReading():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ not in {'reading', 'complex', 'numeral'}:\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" d = F.reading.v(s)\n",
" signs.append((srcfile, document, face, srcln, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"readingRe = re.compile(r'''[a-wyz'][a-wyz,0-9']*''')\n",
"nExcludeRe = re.compile(r'''(?:_|\\b)n(?:_|\\b)''')\n",
"\n",
"def grepSignsReading(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = commentInlineRe.sub('', srcLn)\n",
" srcLn = nExcludeRe.sub('', srcLn)\n",
" data = readingRe.findall(srcLn)\n",
" for d in data:\n",
" signs.append((srcfile, document, face, srcLnNum, d))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 190598 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= and 190578 more\n",
"Number of results: TF 190598; GREP 190598\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSignsReading,\n",
" tfSignsReading,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### All simple signs\n",
"\n",
"Just for redundancy, we do a comparison all simple, non-empty signs in the transcriptions and in TF.\n",
"So: no numerals, no `rrrx(GGG)`, no `rrr!(GGG)`.\n",
"\n",
"We do it based on the `atf` feature and based on the other features. "
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSigns():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ in {'complex', 'numeral', 'commentline'}:\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" atf = F.atf.v(s).rstrip(flaggingStr)\n",
" signs.append((srcfile, document, face, srcln, atf))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:10.430746Z",
"start_time": "2018-03-06T06:47:10.388673Z"
}
},
"outputs": [],
"source": [
"signRe = re.compile(\n",
" r'''x|(?:\\.\\.\\.)|(?:[a-wyzA-WYZ'][a-wyzA-WYZ,0-9']*)|(?:\\(\\$.*?\\$\\))'''\n",
")\n",
"\n",
"def grepSigns(gen):\n",
" signs = []\n",
" for (srcfile, document, face, column, srcLnNum, srcLn) in gen:\n",
" match = transRe.match(srcLn)\n",
" if not match:\n",
" continue\n",
" srcLn = match.group(2)\n",
" srcLn = numeralRe.sub('', srcLn)\n",
" srcLn = complexRe.sub('', srcLn)\n",
" sns = signRe.findall(srcLn)\n",
" for s in sns:\n",
" signs.append((srcfile, document, face, srcLnNum, s))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 199944 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= and 199924 more\n",
"Number of results: TF 199944; GREP 199944\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSigns,\n",
" tfSigns,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:08.336589Z",
"start_time": "2018-03-06T06:47:08.301024Z"
}
},
"outputs": [],
"source": [
"def tfSigns2():\n",
" signs = []\n",
" for s in F.otype.s('sign'):\n",
" typ = F.type.v(s)\n",
" if typ in {'complex', 'numeral', 'commentline'}:\n",
" continue\n",
" (document, face, line) = T.sectionFromNode(s)\n",
" l = L.u(s, otype='line')[0]\n",
" d = T.documentNode(document)\n",
" srcfile = F.srcfile.v(d)\n",
" srcln = F.srcLnNum.v(l)\n",
" atf = (\n",
" F.reading.v(s) if typ == 'reading' else\n",
" f'($ {F.comment.v(s)} $)' if typ == 'comment' else\n",
" F.grapheme.v(s) if typ == 'grapheme' or typ == 'ellipsis' else\n",
" F.reading.v(s) or F.grapheme.v(s) if typ == 'unknown' else\n",
" '§§§'\n",
" )\n",
" signs.append((srcfile, document, face, srcln, atf))\n",
" return signs"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T06:47:13.574443Z",
"start_time": "2018-03-06T06:47:11.497391Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HEAD : srcfile ◆ tablet ◆ ln ◆ sign\n",
"IDENTICAL: all 199944 items\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni\n",
"= : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma\n",
"= and 199924 more\n",
"Number of results: TF 199944; GREP 199944\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"COMP.checkSanity(\n",
" ('sign',),\n",
" grepSigns,\n",
" tfSigns2,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Here ends the checking.\n",
"\n",
"This notebook has tested all patterns and quantities found in the transcriptions.\n",
"\n",
"By a somewhat convoluted GREP we have extracted patterns from the sources.\n",
"\n",
"By somewhat contrived TF alchemy we have produced the same patterns from the Text-Fabric\n",
"representation of the sources.\n",
"\n",
"Then we have made a rigorous comparison: we have checked wether both methods found exactly\n",
"the same sequence of values.\n",
"\n",
"And that turned out to be so!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": "block",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}