{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tdownloading latest annotation/app-oldbabylonian\n",
"\tfrom https://api.github.com/repos/annotation/app-oldbabylonian/zipball ...\n",
"\tunzipping ...\n",
"\tsaving annotation/app-oldbabylonian commit 1e24c3de9f38234d7d351196e828c74fe06151c0\n",
"\tsaved annotation/app-oldbabylonian commit 1e24c3de9f38234d7d351196e828c74fe06151c0\n",
"Using annotation/app-oldbabylonian commit 1e24c3de9f38234d7d351196e828c74fe06151c0 (=latest)\n",
" in /Users/dirk/text-fabric-data/__apps__/oldbabylonian\n",
"No new data release available online.\n",
"Using Nino-cunei/oldbabylonian/tf - 1.0.4 rv1.4 (=latest) in /Users/dirk/text-fabric-data.\n"
]
},
{
"data": {
"text/html": [
"Documentation: OLDBABYLONIAN Character table Feature docs oldbabylonian API Text-Fabric API 7.5.3 Search ReferenceLoaded features:
\n",
"Old Babylonian Letters 1900-1600: Cuneiform tablets : ARK after afterr afteru atf atfpost atfpre author col collated collection comment damage det docnote docnumber excavation excised face flags fraction genre grapheme graphemer graphemeu lang langalt ln lnc lnno material missing museumcode museumname object operator operatorr operatoru otype period pnumber primecol primeln pubdate question reading readingr readingu remarkable remarks repeat srcLn srcLnNum srcfile subgenre supplied sym symr symu trans transcriber translation@ll type uncertain volume oslots
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"API members:
\n",
"C Computed, Call AllComputeds, Cs ComputedString
\n",
"E Edge, Eall AllEdges, Es EdgeString
\n",
"ensureLoaded, TF, ignored, loadLog
\n",
"L Locality
\n",
"cache, error, indent, info, reset
\n",
"N Nodes, sortKey, sortKeyTuple, otypeRank, sortNodes
\n",
"F Feature, Fall AllFeatures, Fs FeatureString
\n",
"S Search
\n",
"T Text "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A = use('oldbabylonian', hoist=globals(), check=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Word searches\n",
"\n",
"Searches for particular morphology inside words can become complicated. Here are some ways to achieve results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# i-na + ...-?im\n",
"\n",
"We look for word pairs, of which the first is `i-na` and the second ends in a sign whose reading ends in `im`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explanation of the expression in the last line `reading~im$`.\n",
"\n",
"We do not say: \n",
"> `reading` equals `im` \n",
"\n",
"but\n",
"\n",
"> `reading` **matches** `im$`\n",
"\n",
"Matching means that the reading is matched against a pattern, also known as a *regular expression*.\n",
"\n",
"This regular expression means: it should contain the substring `im` at the end. The `$` matches the end of the string.\n",
"\n",
"You can use any legal regular expression that Python recognizes.\n",
"\n",
"For a reference, consult the\n",
"[Python documentation](https://docs.python.org/3/library/re.html#module-re)\n",
"of regular expressions."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.71s 307 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | i- | na | la-hi-a-nim | nim |
\n",
"| 2 | P510527 obverse:6 | {disz}ip-qu2-i3-li2-szu _di-ku5_ i-na pu-uh2-ri-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 3 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 4 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | da-ba-bi-im | im |
\n",
"| 5 | P510538 obverse:10 | i-na tam-li-tim | i-na | i- | na | tam-li-tim | tim |
\n",
"| 6 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | i- | na# | pa-ni-tim | tim |
\n",
"| 7 | P510567 reverse:7 | [i-na] e-bu-ri-im | [i-na] | [i- | na] | e-bu-ri-im | im |
\n",
"| 8 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | i- | na | an-ni-tim | tim |
\n",
"| 9 | P510574 obverse:8 | tup-pi2 i-na a-ma-ri-im | i-na | i- | na | a-ma-ri-im | im |
\n",
"| 10 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | [i]- | na# | qa-tim | tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's vary a bit on this theme. Suppose we want to tighten the criterion that the last sign of the last word \n",
"ends in `im`. Suppose we want it to be `tim` or `nim`. We can express that as follows:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~^[nt]im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explanation: `^` matches the start of the reading. So the pattern `[nt]im` must cover the whole reading.\n",
"`[nt]` means: either `n` or `t`. In general, `[` *characters* `]` is a choice between the *characters*.\n",
"You can also say things like `[A-Z0-9]`, which matches any upper case latin letter or a digit."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.70s 120 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | i- | na | la-hi-a-nim | nim |
\n",
"| 2 | P510538 obverse:10 | i-na tam-li-tim | i-na | i- | na | tam-li-tim | tim |
\n",
"| 3 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | i- | na# | pa-ni-tim | tim |
\n",
"| 4 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | i- | na | an-ni-tim | tim |
\n",
"| 5 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | [i]- | na# | qa-tim | tim |
\n",
"| 6 | P510593 obverse:8 | i-na pa-ni-tim i-nu-ma a-na tam-li-tim a-na e2-duru5-bi2-sa3{ki#} | i-na | i- | na | pa-ni-tim | tim |
\n",
"| 7 | P510643 reverse:6 | i-na an-ni-tim ta-ka-li ta-ma-ar | i-na | i- | na | an-ni-tim | tim |
\n",
"| 8 | P510659 reverse:10' | i-na an-ni-tim at#-[hu-ut-ka] | i-na | i- | na | an-ni-tim | tim |
\n",
"| 9 | P510698 obverse:11 | szum-ma i-na ki-tim a-bi | i-na | i- | na | ki-tim | tim |
\n",
"| 10 | P510698 obverse:13 | i-na an-ni-tim et,-ra-an-ni-i-ma | i-na | i- | na | an-ni-tim | tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we wanted a reading that is `tim`, `nim` or `im`? We can say that as follows:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~^[nt]?im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explanation: the `?` makes the preceding thing *optional*. The preceding thing here is `[nt]`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.72s 301 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | i- | na | la-hi-a-nim | nim |
\n",
"| 2 | P510527 obverse:6 | {disz}ip-qu2-i3-li2-szu _di-ku5_ i-na pu-uh2-ri-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 3 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 4 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | da-ba-bi-im | im |
\n",
"| 5 | P510538 obverse:10 | i-na tam-li-tim | i-na | i- | na | tam-li-tim | tim |
\n",
"| 6 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | i- | na# | pa-ni-tim | tim |
\n",
"| 7 | P510567 reverse:7 | [i-na] e-bu-ri-im | [i-na] | [i- | na] | e-bu-ri-im | im |
\n",
"| 8 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | i- | na | an-ni-tim | tim |
\n",
"| 9 | P510574 obverse:8 | tup-pi2 i-na a-ma-ri-im | i-na | i- | na | a-ma-ri-im | im |
\n",
"| 10 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | [i]- | na# | qa-tim | tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have a few discrete options, you can also list the options and separate them with `|`.\n",
"\n",
"Let's obtain the same results with a different expression:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~^(tim|nim|im)$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Caution: mind the brackets: we do not want \n",
"\n",
"> `^tim` or `nim` or `im$`\n",
"\n",
"but\n",
"\n",
"> `^`, then `tim` or `nim` or `im`, then `$`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.71s 301 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | i- | na | la-hi-a-nim | nim |
\n",
"| 2 | P510527 obverse:6 | {disz}ip-qu2-i3-li2-szu _di-ku5_ i-na pu-uh2-ri-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 3 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | pu-uh2-ri-im | im |
\n",
"| 4 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | i- | na | da-ba-bi-im | im |
\n",
"| 5 | P510538 obverse:10 | i-na tam-li-tim | i-na | i- | na | tam-li-tim | tim |
\n",
"| 6 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | i- | na# | pa-ni-tim | tim |
\n",
"| 7 | P510567 reverse:7 | [i-na] e-bu-ri-im | [i-na] | [i- | na] | e-bu-ri-im | im |
\n",
"| 8 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | i- | na | an-ni-tim | tim |
\n",
"| 9 | P510574 obverse:8 | tup-pi2 i-na a-ma-ri-im | i-na | i- | na | a-ma-ri-im | im |
\n",
"| 10 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | [i]- | na# | qa-tim | tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have 6 results less than our original query.\n",
"\n",
"Can we find a template that searches exactly for the missing ones?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~^[^nt]im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explanation: the `^` inside the square brackets means the negation of the characters listed.\n",
"So here we say: we want anything **but** an `n` or a `t`.\n",
"Note that we still want *anything*, so the case of a bare `im` will not match.\n",
"\n",
"So this yields precisely those cases that we found initially, minus the `nim`, `tim`, `im` cases."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.73s 6 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P510596 obverse:11 | ki#-ma ti-du-u2 i-na a-lim ma-ah-ri-ka | i-na | i- | na | a-lim | lim |
\n",
"| 2 | P510608 obverse:10 | u2-lu i-na a-lim e-ma i-ba-asz-szu#-u2 | i-na | i- | na | a-lim | lim |
\n",
"| 3 | P510784 reverse:3 | ki-ma i-na a-lim te-<esz>-te-ne2-em-mu | i-na | i- | na | a-lim | lim |
\n",
"| 4 | P510837 obverse:8 | {disz}{d}na-bi-um-ma-lik i-na# _a-sza3_-lim | i-na# | i- | na# | _a-sza3_-lim | lim |
\n",
"| 5 | P313311 reverse:10 | um-mi i-na a-lim is-su2-ha | i-na | i- | na | a-lim | lim |
\n",
"| 6 | P275147 obverse:6 | i-[na e-mu-ut]-ba-lim ka-li-a | i-[na | i- | [na | e-mu-ut]-ba-lim | lim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is an alternative way of matching words. Not by sign, but by using the feature `sym` on words."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word sym=i-na\n",
" <: word sym~im$\n",
"'''"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.19s 306 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | word | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | la-hi-a-nim |
\n",
"| 2 | P510527 obverse:6 | {disz}ip-qu2-i3-li2-szu _di-ku5_ i-na pu-uh2-ri-im | i-na | pu-uh2-ri-im |
\n",
"| 3 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | pu-uh2-ri-im |
\n",
"| 4 | P510527 obverse:15 | i-na pu-uh2-ri-im i-na da-ba-bi-im | i-na | da-ba-bi-im |
\n",
"| 5 | P510538 obverse:10 | i-na tam-li-tim | i-na | tam-li-tim |
\n",
"| 6 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | pa-ni-tim |
\n",
"| 7 | P510567 reverse:7 | [i-na] e-bu-ri-im | [i-na] | e-bu-ri-im |
\n",
"| 8 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | an-ni-tim |
\n",
"| 9 | P510574 obverse:8 | tup-pi2 i-na a-ma-ri-im | i-na | a-ma-ri-im |
\n",
"| 10 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | qa-tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems that we miss one result in this way. Let's find out which:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word sym~(?result 1
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" tup-pi2 \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" tup-\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" pi2 \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" i-na \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" i-\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" na \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" a-ma-ri-im!(SZI)\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" a-\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" ma-\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" ri-\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"
\n",
" im!(SZI)\n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.show(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ah: the second word does end in `im` reading-wise, but not sym-wise, because the sym feature has `im!SZI`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solution\n",
"\n",
"Let's quickly inspect all readings ending in `im`:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['dim',\n",
" 'erim',\n",
" 'gim',\n",
" 'idim',\n",
" 'im',\n",
" 'inim',\n",
" 'lim',\n",
" 'maszkim',\n",
" 'muhaldim',\n",
" 'nim',\n",
" 'silim',\n",
" 'sim',\n",
" 'szim',\n",
" 'szitim',\n",
" 'tim',\n",
" 'zadim']"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted({F.reading.v(s) for s in F.otype.s('sign') if (F.reading.v(s) or '').endswith('im')})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do not want to consider readings like `maszkim` and `muhaldim`, just the ones with a single letter in front of the `im`.\n",
"Alas, the `sz` also counts as a single letter.\n",
"\n",
"Lets turn to `symr` instead of `sym`."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['dim',\n",
" 'erim',\n",
" 'gim',\n",
" 'idim',\n",
" 'im',\n",
" 'inim',\n",
" 'lim',\n",
" 'maškim',\n",
" 'muhaldim',\n",
" 'nim',\n",
" 'silim',\n",
" 'sim',\n",
" 'tim',\n",
" 'zadim',\n",
" 'šim',\n",
" 'šitim']"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted({F.readingr.v(s) for s in F.otype.s('sign') if (F.readingr.v(s) or '').endswith('im')})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can state the condition: words where feature symr consists of either `im` or a single letter followed by `im`."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word sym=i-na\n",
" <: word symr~-.?im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explanation: the dot `.` stands for an arbitrary, single letter. Because of the `?` behind it, that letter is optional."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.18s 306 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | n | p | \n",
" line | word | word | \n",
"
\n",
"\n",
"| 1 | P509375 reverse:9 | i-na la-hi-a-nim | i-na | la-hi-a-nim |
\n",
"| 2 | P510538 obverse:10 | i-na tam-li-tim | i-na | tam-li-tim |
\n",
"| 3 | P510562 obverse:7 | i-na# pa-ni-tim a-na a-<ma?>-az{ki} ta-al-li-ik-ma# | i-na# | pa-ni-tim |
\n",
"| 4 | P510571 reverse:13 | i-na an-ni-tim at-hu-<ut>-ka# | i-na | an-ni-tim |
\n",
"| 5 | P510575 obverse:11 | [i]-na# qa-tim ta-ki-il-tim | [i]-na# | qa-tim |
\n",
"| 6 | P510593 obverse:8 | i-na pa-ni-tim i-nu-ma a-na tam-li-tim a-na e2-duru5-bi2-sa3{ki#} | i-na | pa-ni-tim |
\n",
"| 7 | P510596 obverse:11 | ki#-ma ti-du-u2 i-na a-lim ma-ah-ri-ka | i-na | a-lim |
\n",
"| 8 | P510608 obverse:10 | u2-lu i-na a-lim e-ma i-ba-asz-szu#-u2 | i-na | a-lim |
\n",
"| 9 | P510643 reverse:6 | i-na an-ni-tim ta-ka-li ta-ma-ar | i-na | an-ni-tim |
\n",
"| 10 | P510659 reverse:10' | i-na an-ni-tim at#-[hu-ut-ka] | i-na | an-ni-tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# i-na + ...!im + ...-?im\n",
"\n",
"We look for word triples, of which the first is `i-na`, the second does not end in `im` and the third one ends in `im`.\n",
"In the third word, there may be a single letter before the im of the last sign."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word sym=i-na\n",
" <: word symr~(?\n",
"\n",
" | n | p | \n",
" line | word | word | word | \n",
"
\n",
"\n",
"| 1 | P509373 reverse:11' | a-na ki-ma i-[na] _dub e2-gal_-lim | i-[na] | _dub | e2-gal_-lim |
\n",
"| 2 | P510573 reverse:1 | i-na <<an-na>> an-ni-tim a-hu!-ut-ka | i-na | <<an-na>> | an-ni-tim |
\n",
"| 3 | P510594 obverse:5' | szum-ma i-na _{gesz}ban2_ {d}ki-it-tim | i-na | _{gesz}ban2_ | {d}ki-it-tim |
\n",
"| 4 | P510594 obverse:7' | i-na _{gesz}ban2_ {d}ki-it#-tim# | i-na | _{gesz}ban2_ | {d}ki-it#-tim# |
\n",
"| 5 | P510594 reverse:3 | szum-ma i-na _{gesz}ban2_ {d}ki-it-tim | i-na | _{gesz}ban2_ | {d}ki-it-tim |
\n",
"| 6 | P510607 obverse:10 | i-na pi2-ha-at a-lim | i-na | pi2-ha-at | a-lim |
\n",
"| 7 | P510677 obverse:2' | [i-na _e2_ a-ki-tim isz]-sza#-ak#-[ka-an] | [i-na | _e2_ | a-ki-tim |
\n",
"| 8 | P510688 obverse:10 | <<i>> el-qe2 i-na _a-sza3_ [x]-x-lim | i-na | _a-sza3_ | [x]-x-lim |
\n",
"| 9 | P510712 reverse:17' | [i]-na# re-esz ma-ak-ku-ri-im# | [i]-na# | re-esz | ma-ak-ku-ri-im# |
\n",
"| 10 | P510722 reverse:8 | i-na pi2-sza-an-ni ku-nu-ka-tim | i-na | pi2-sza-an-ni | ku-nu-ka-tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The question is: we do miss cases where the second word ends in e.g. `-maškim`. Is that bad?\n",
"Let's find the missing cases:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word sym=i-na\n",
" <: word symr~[^-][^-]im$\n",
" <: word symr~-.?im$\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we actively look for cases where the second word ends in a reading that ends in `im`, preceded by at least two characters\n",
"that are not a hyphen."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.27s 0 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do not find any, so we can stick to our initial query for triplet words."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just in case you like the highlighting of signs, we rewrite this query in the more elaborate, sign based form:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"query = '''\n",
"line\n",
" word\n",
" =: sign reading=i\n",
" <: sign reading=na\n",
" :=\n",
" <: word\n",
" := sign reading~(?\n",
"\n",
" | n | p | \n",
" line | word | sign | sign | word | sign | word | sign | \n",
"
\n",
"\n",
"| 1 | P509373 reverse:11' | a-na ki-ma i-[na] _dub e2-gal_-lim | i-[na] | i- | [na] | _dub | _dub | e2-gal_-lim | lim |
\n",
"| 2 | P510573 reverse:1 | i-na <<an-na>> an-ni-tim a-hu!-ut-ka | i-na | i- | na | <<an-na>> | na>> | an-ni-tim | tim |
\n",
"| 3 | P510594 obverse:5' | szum-ma i-na _{gesz}ban2_ {d}ki-it-tim | i-na | i- | na | _{gesz}ban2_ | ban2_ | {d}ki-it-tim | tim |
\n",
"| 4 | P510594 obverse:7' | i-na _{gesz}ban2_ {d}ki-it#-tim# | i-na | i- | na | _{gesz}ban2_ | ban2_ | {d}ki-it#-tim# | tim# |
\n",
"| 5 | P510594 reverse:3 | szum-ma i-na _{gesz}ban2_ {d}ki-it-tim | i-na | i- | na | _{gesz}ban2_ | ban2_ | {d}ki-it-tim | tim |
\n",
"| 6 | P510607 obverse:10 | i-na pi2-ha-at a-lim | i-na | i- | na | pi2-ha-at | at | a-lim | lim |
\n",
"| 7 | P510677 obverse:2' | [i-na _e2_ a-ki-tim isz]-sza#-ak#-[ka-an] | [i-na | [i- | na | _e2_ | _e2_ | a-ki-tim | tim |
\n",
"| 8 | P510688 obverse:10 | <<i>> el-qe2 i-na _a-sza3_ [x]-x-lim | i-na | i- | na | _a-sza3_ | sza3_ | [x]-x-lim | lim |
\n",
"| 9 | P510712 reverse:17' | [i]-na# re-esz ma-ak-ku-ri-im# | [i]-na# | [i]- | na# | re-esz | esz | ma-ak-ku-ri-im# | im# |
\n",
"| 10 | P510722 reverse:8 | i-na pi2-sza-an-ni ku-nu-ka-tim | i-na | i- | na | pi2-sza-an-ni | ni | ku-nu-ka-tim | tim |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, end=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}