https://github.com/annotation/text-fabric
Raw File
Tip revision: c730976886b6d0b3af9c98088c583e457c4f2a85 authored by Dirk Roorda on 07 June 2018, 13:07:00 UTC
New major release 4.4.0
Tip revision: c730976
README.md
# text-fabric
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1008899.svg)](https://doi.org/10.5281/zenodo.592193)
![text-fabric](https://raw.github.com/Dans-labs/text-fabric/master/docs/images/tf.png)


Text-Fabric is a Python3 package for Text plus Annotations.

It provides a data model, a text file format, and a binary format for (ancient) text plus
(linguistic) annotations.

A defining characteristic is that Text-Fabric does not make use of XML or JSON,
but stores text as a bunch of features.
These features are interpreted against a *graph* of nodes and edges, which make up the
abstract fabric of the text.

Based on this model, Text-Fabric offers an API to search, navigate and process text
and its annotations.
The search API works with search templates that define relational patterns
which will be instantiated by nodes and edges of the fabric.

The emphasis of this all is on:

* data processing
* sharing data
* contributing modules
* search for patterns

The intended use case is ancient writings, such as the Hebrew Bible or (Proto)-Cuneifrom tablets.
Also the Greek and Syriac New Testament have been converted to TF.

Text-Fabric not only deals with the text, but also with rich sets of linguistic annotations added to them.
It has been used to construct the website
[SHEBANQ](https://shebanq.ancient-data.org) and it is being
used by researchers to analyse such texts. 

# Install

Text Fabric is a Python(3) package on the Python Package Index, so you can install it easily with `pip` from
the command line. Here are the precise
[installation instructions](https://dans-labs.github.io/text-fabric/).

# Documentation

There is extensive documentation.

1. The [github pages associated with this repo](https://dans-labs.github.io/text-fabric/) are meant as a reference.
   1. It explains the [data model](https://dans-labs.github.io/text-fabric/Model/Data-Model/)
   2. It specifies the [file format](https://dans-labs.github.io/text-fabric/Model/File-formats/)
   3. It holds the [api docs](https://dans-labs.github.io/text-fabric/Api/General/)
2. There are tutorials and exercises to guide you into increasingly involved tasks
   on specific corpora (outside this repo):
   1. [Biblia Hebraica Stuttgartensia Amstelodamensis](https://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/start.ipynb)
   1. [Proto-Cuneiform tablets from Uruk IV/III](https://nbviewer.jupyter.org/github/nino-cunei/tutorials/blob/master/start.ipynb)
3. For more background information (earlier work, institutes, people, datasets), consult the
   [wiki](https://github.com/ETCBC/shebanq/wiki)
   pages of SHEBANQ.
4. Papers (preprints on [arxiv](https://arxiv.org)), most of them published:
   1. [Parallel Texts in the Hebrew Bible, New Methods and Visualizations ](https://arxiv.org/abs/1603.01541)
   1. [The Hebrew Bible as Data: Laboratory - Sharing - Experiences](https://www.ubiquitypress.com/site/chapters/10.5334/bbi.18/)
      (preprint: [arxiv](https://arxiv.org/abs/1501.01866))
   1. [LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible](https://arxiv.org/abs/1410.0286)
   1. [Annotation as a New Paradigm in Research Archiving](https://arxiv.org/abs/1412.6069)

   N.B. [LAF-Fabric](https://github.com/Dans-labs/laf-fabric) is a precursor of Text-Fabric.
   Text-Fabric is simpler to use, and has something that LAF-Fabric does not have:
   a powerful structured search engine.

# Data

In order to work with Text-Fabric, you need a dataset to operate on.
The most developed corpora in TF are

1. [Biblia Hebraica Stuttgartensia Amstelodamensis](https://github/etcbc/bhsa)
1. [Proto-Cuneiform tablets from Uruk IV/III](https://github/nino-cunei/uruk)

There is a also
[collection of datasets and additional modules](https://Dans-labs.github.io/text-fabric-data/)
ready to be used in text-fabric. The link points to the data documentation, and from there you find the GitHub
repo where the data resides.

---

[Motivation](http://www.slideshare.net/dirkroorda/text-fabric) - 
[Data](https://github.com/Dans-labs/text-fabric-data)

back to top