https://github.com/TEIC/TEI
Raw File
Tip revision: 347c64fac3fa1a64ade0d8d1842813d4b8f7acec authored by Hugh Cayless on 12 May 2017, 16:57:59 UTC
Updates.
Tip revision: 347c64f
CC.html

<!DOCTYPE html
  SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><!--THIS FILE IS GENERATED FROM AN XML MASTER. DO NOT EDIT (4)--><title>15 Language Corpora - The TEI Guidelines</title><meta property="Language" content="en" /><meta property="DC.Title" content="15 Language Corpora - The TEI Guidelines" /><meta property="DC.Language" content="SCHEME=iso639 en" /><meta property="DC.Creator.Address" content="tei@oucs.ox.ac.uk" /><meta charset="utf-8" /><link href="guidelines.css" rel="stylesheet" type="text/css" /><link href="odd.css" rel="stylesheet" type="text/css" /><link rel="stylesheet" media="print" type="text/css" href="guidelines-print.css" /><script type="text/javascript" src="jquery-1.2.6.min.js"></script><script type="text/javascript" src="columnlist.js"></script><script type="text/javascript" src="popupFootnotes.js"></script><script type="text/javascript">
        $(function() {
         $('ul.attrefs-class').columnizeList({cols:3,width:30,unit:'%'});
         $('ul.attrefs-element').columnizeList({cols:3,width:30,unit:'%'});
         $(".displayRelaxButton").click(function() {
           $(this).parent().find('.RNG_XML').toggle();
           $(this).parent().find('.RNG_Compact').toggle();
         });
         $(".tocTree .showhide").click(function() {
          $(this).find(".tocShow,.tocHide").toggle();
          $(this).parent().find("ul.continuedtoc").toggle();
	  });
        })
    </script><script type="text/javascript"><!--
var displayXML=0;
states=new Array()
states[0]="element-a"
states[1]="element-b"
states[2]="element-c"
states[3]="element-d"
states[4]="element-e"
states[5]="element-f"
states[6]="element-g"
states[7]="element-h"
states[8]="element-i"
states[9]="element-j"
states[10]="element-k"
states[11]="element-l"
states[12]="element-m"
states[13]="element-n"
states[14]="element-o"
states[15]="element-p"
states[16]="element-q"
states[17]="element-r"
states[18]="element-s"
states[19]="element-t"
states[20]="element-u"
states[21]="element-v"
states[22]="element-w"
states[23]="element-x"
states[24]="element-y"
states[25]="element-z"

function startUp() {

}

function hideallExcept(elm) {
for (var i = 0; i < states.length; i++) {
 var layer;
 if (layer = document.getElementById(states[i]) ) {
  if (states[i] != elm) {
    layer.style.display = "none";
  }
  else {
   layer.style.display = "block";
      }
  }
 }
 var mod;
 if ( mod = document.getElementById('byMod') ) {
     mod.style.display = "none";
 }
}

function showall() {
 for (var i = 0; i < states.length; i++) {
   var layer;
   if (layer = document.getElementById(states[i]) ) {
      layer.style.display = "block";
      }
  }
}

function showByMod() {
  hideallExcept('');
  var mod;
  if (mod = document.getElementById('byMod') ) {
     mod.style.display = "block";
     }
}

	--></script></head><body><div id="container"><div id="banner"><img src="Images/banner.jpg" alt="Text Encoding Initiative logo and banner" /></div></div><div class="mainhead"><h1>P5: 
    Guidelines for Electronic Text Encoding and Interchange</h1><p>Version 3.1.1a. Last updated on
	10th May 2017, revision bd8dda3</p></div><div id="onecol" class="main-content"><h2><span class="headingNumber">15 </span>Language Corpora</h2><div class="div1" id="CC"><div class="miniTOC miniTOC_left"><p><span class="subtochead">Table of contents</span></p><div class="subtoc"><ul class="subtoc"><li class="subtoc"><a class="subtoc" href="CC.html#CCDEF" title="Varieties of Composite Text">15.1 Varieties of Composite Text</a></li><li class="subtoc"><a class="subtoc" href="CC.html#CCAH" title="Contextual Information">15.2 Contextual Information</a></li><li class="subtoc"><a class="subtoc" href="CC.html#CCAS" title="Associating Contextual Information with a Text">15.3 Associating Contextual Information with a Text</a></li><li class="subtoc"><a class="subtoc" href="CC.html#CCAN" title="Linguistic Annotation of Corpora">15.4 Linguistic Annotation of Corpora</a></li><li class="subtoc"><a class="subtoc" href="CC.html#CCREC" title="Recommendations for the Encoding of Large Corpora">15.5 Recommendations for the Encoding of Large Corpora</a></li><li class="subtoc"><a class="subtoc" href="CC.html#index-body.1_div.15_div.6">15.6 Module for Language Corpora</a></li></ul></div><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="FT.html"><span class="headingNumber">14 </span>Tables, Formulæ, Graphics and Notated Music</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="SA.html"><span class="headingNumber">16 </span>Linking, Segmentation, and Alignment</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><p>The term <span class="term">language corpus</span> is used to mean a number of rather different things. It may refer simply to any collection of linguistic data (for example, written, spoken, signed, or multimodal), although many practitioners prefer to reserve it for collections which have been organized or collected with a particular end in view, generally to characterize a particular state or variety of one or more languages. Because opinions as to the best method of achieving this goal differ, various subcategories of corpora have also been identified. For our purposes however, the distinguishing characteristic of a corpus is that its components have been selected or structured according to some conscious set of design criteria.</p><p>These design criteria may be very simple and undemanding, or very sophisticated. A corpus may be intended to represent (in the statistical sense) a particular linguistic variety or sublanguage, or it may be intended to represent all aspects of some assumed ‘core’ language. A corpus may be made up of whole texts or of fragments or text samples. It may be a ‘closed’ corpus, or an ‘open’ or ‘monitor’ corpus, the composition of which may change over time. However, since an open corpus is of necessity finite at any particular point in time, the only likely effect of its expansibility from the encoding point of view may be some increased difficulty in maintaining consistent encoding practices (see further section <a class="link_ptr" href="CC.html#CCREC" title="Recommendations for the Encoding of Large Corpora"><span class="headingNumber">15.5 </span>Recommendations for the Encoding of Large Corpora</a>). For simplicity, therefore, our discussion largely concerns ways of encoding closed corpora, regarded as single but composite texts.</p><p>Language corpora are regarded by these Guidelines as <span class="term">composite texts</span> rather than <span class="term">unitary texts</span> (on this distinction, see chapter <a class="link_ptr" href="DS.html" title="7"><span class="headingNumber">4 </span>Default Text Structure</a>). This is because although each discrete sample of language in a corpus clearly has a claim to be considered as a text in its own right, it is also regarded as a subdivision of some larger object, if only for convenience of analysis. Corpora share a number of characteristics with other types of composite texts, including anthologies and collections. Most notably, different components of composite texts may exhibit different structural properties (for example, some may be composed of verse, and others of prose), thus potentially requiring elements from different TEI modules.</p><p>Aside from these high-level structural differences, and possibly differences of scale, the encoding of language corpora and the encoding of individual texts present identical sets of problems. Any of the encoding techniques and elements presented in other chapters of these Guidelines may therefore prove relevant to some aspect of corpus encoding and may be used in corpora. Therefore, we do not repeat here the discussion of such fundamental matters as the representation of multiple character sets (see chapter <a class="link_ptr" href="CH.html" title="4"><span class="headingNumber">vi. </span>Languages and Character Sets</a>); nor do we attempt to summarize the variety of elements provided for encoding basic structural features such as quoted or highlighted phrases, cross-references, lists, notes, editorial changes and reference systems (see chapter <a class="link_ptr" href="CO.html" title="6"><span class="headingNumber">3 </span>Elements Available in All TEI Documents</a>). In addition to these general purpose elements, these Guidelines offer a range of more specialized sets of tags which may be of use in certain specialized corpora, for example those consisting primarily of verse (chapter <a class="link_ptr" href="VE.html" title="9"><span class="headingNumber">6 </span>Verse</a>), drama (chapter <a class="link_ptr" href="DR.html" title="10"><span class="headingNumber">7 </span>Performance Texts</a>), transcriptions of spoken text (chapter <a class="link_ptr" href="TS.html" title="11"><span class="headingNumber">8 </span>Transcriptions of Speech</a>), etc. Chapter <a class="link_ptr" href="ST.html" title="3"><span class="headingNumber">1 </span>The TEI Infrastructure</a> should be reviewed for details of how these and other components of these Guidelines should be tailored to create a document type definition appropriate to a given application. In sum, it should not be assumed that only the matters specifically addressed in this chapter are of importance for corpus creators.</p><p>This chapter does however include some other material relevant to corpora and corpus-building, for which no other location appeared suitable. It begins with a review of the distinction between unitary and composite texts, and of the different methods provided by these Guidelines for representing composite texts of different kinds (section <a class="link_ptr" href="CC.html#CCDEF" title="Varieties of Composite Text"><span class="headingNumber">15.1 </span>Varieties of Composite Text</a>). Section <a class="link_ptr" href="CC.html#CCAH" title="Contextual Information"><span class="headingNumber">15.2 </span>Contextual Information</a> describes a set of additional header elements provided for the documentation of contextual information, of importance largely though not exclusively to language corpora. This is the additional module for language corpora proper. Section <a class="link_ptr" href="CC.html#CCAS" title="Associating Contextual Information with a Text"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a> discusses a mechanism by which individual parts of the TEI header may be associated with different parts of a TEI-conformant text. Section <a class="link_ptr" href="CC.html#CCAN" title="Linguistic Annotation of Corpora"><span class="headingNumber">15.4 </span>Linguistic Annotation of Corpora</a> reviews various methods of providing linguistic annotation in corpora, with some specific examples of relevance to current practice in corpus linguistics. Finally, section <a class="link_ptr" href="CC.html#CCREC" title="Recommendations for the Encoding of Large Corpora"><span class="headingNumber">15.5 </span>Recommendations for the Encoding of Large Corpora</a> provides some general recommendations about the use of these Guidelines in the building of large corpora.</p><div class="div2" id="CCDEF"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAH"><span class="headingNumber">15.2 </span>Contextual Information</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#CCDEF" title="link to this section "><span class="invisible">TEI: Varieties of Composite Text</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.1 </span><span class="head">Varieties of Composite Text</span></h3><p>Both unitary and composite texts may be encoded using these Guidelines; composite texts, including corpora, will typically make use of the following tags for their top-level organization. </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-teiCorpus.html">teiCorpus</a></span> contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a> elements, each containing a single text header and a text.</li><li><span class="specList-elementSpec"><a href="ref-TEI.html">TEI</a></span> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the <a class="link_odd" title="groups separate elements which constitute the content of a digital resource, as opposed to its metadata." href="ref-model.resourceLike.html">model.resourceLike</a> class. Multiple <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a> elements may be combined to form a <a class="gi" title="contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more &lt;TEI&gt; elements, each containing a single text header and a text." href="ref-teiCorpus.html">teiCorpus</a> element.</li><li><span class="specList-elementSpec"><a href="ref-teiHeader.html">teiHeader</a></span> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources.</li><li><span class="specList-elementSpec"><a href="ref-text.html">text</a></span> contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample.</li><li><span class="specList-elementSpec"><a href="ref-group.html">group</a></span> contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc.</li></ul><p> Full descriptions of these may be found in chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a> (for <a class="gi" title="(TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources." href="ref-teiHeader.html">teiHeader</a>), and chapter <a class="link_ptr" href="DS.html" title="7"><span class="headingNumber">4 </span>Default Text Structure</a> (for <a class="gi" title="contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more &lt;TEI&gt; elements, each containing a single text header and a text." href="ref-teiCorpus.html">teiCorpus</a> <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a>, <a class="gi" title="contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample." href="ref-text.html">text</a> and <a class="gi" title="contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc." href="ref-group.html">group</a>); this section discusses their application to composite texts in particular.</p><p>In these Guidelines, the word <span class="term">text</span> refers to any stretch of discourse, whether complete or incomplete, unitary or composite, which the encoder chooses (perhaps merely for purposes of analytic convenience) to regard as a unit. The term <span class="term">composite text</span> refers to texts within which other texts appear; the following common cases may be distinguished: </p><ul class="bulleted"><li class="item">language corpora</li><li class="item">collections or anthologies</li><li class="item">poem cycles and epistolary works (novels or essays written in the form of collections or series of letters)</li><li class="item">otherwise unitary texts, within which one or more subordinate texts are embedded</li></ul><p> The elements listed above may be combined to encode each of these varieties of composite text in different ways.</p><p>In corpora, the component samples are clearly distinct texts, but the systematic collection, standardized preparation, and common markup of the corpus often make it useful to treat the entire corpus as a unit, too. Some corpora may become so well established as to be regarded as texts in their own right; the Brown and LOB corpora are now close to achieving this status. </p><div class="p">The <a class="gi" title="contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more &lt;TEI&gt; elements, each containing a single text header and a text." href="ref-teiCorpus.html">teiCorpus</a> element is intended for the encoding of language corpora, though it may also be useful in encoding newspapers, electronic anthologies, and other disparate collections of material. The individual samples in the corpus are encoded as separate <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a> elements, and the entire corpus is enclosed in a <a class="gi" title="contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more &lt;TEI&gt; elements, each containing a single text header and a text." href="ref-teiCorpus.html">teiCorpus</a> element. Each sample has the usual structure for a <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a> document, comprising a <a class="gi" title="(TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources." href="ref-teiHeader.html">teiHeader</a> followed by a <a class="gi" title="contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample." href="ref-text.html">text</a> element. The corpus, too, has a corpus-level <a class="gi" title="(TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources." href="ref-teiHeader.html">teiHeader</a> element, in which the corpus as a whole, and encoding practices common to multiple samples may be described. The overall structure of a TEI-conformant corpus is thus: <div id="index-egXML-d52e115098" class="pre egXML_feasible"><span class="element">&lt;teiCorpus xmlns="http://www.tei-c.org/ns/1.0"&gt;</span><br /> <span class="element">&lt;teiHeader/&gt;</span><br /> <span class="element">&lt;TEI&gt;</span><br />  <span class="element">&lt;teiHeader/&gt;</span><br />  <span class="element">&lt;text/&gt;</span><br /> <span class="element">&lt;/TEI&gt;</span><br /> <span class="element">&lt;TEI&gt;</span><br />  <span class="element">&lt;teiHeader/&gt;</span><br />  <span class="element">&lt;text/&gt;</span><br /> <span class="element">&lt;/TEI&gt;</span><br /><span class="element">&lt;/teiCorpus&gt;</span></div></div><p>Header information which relates to the whole corpus rather than to individual components of it should be factored out and included in the <a class="gi" title="(TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources." href="ref-teiHeader.html">teiHeader</a> element prefixed to the whole. This two-level structure allows for contextual information to be specified at the corpus level, at the individual text level, or at both. Discussion of the kinds of information which may thus be specified is provided below, in section <a class="link_ptr" href="CC.html#CCAH" title="Contextual Information"><span class="headingNumber">15.2 </span>Contextual Information</a>, as well as in chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>. Information of this type should in general be specified only once: a variety of methods are provided for associating it with individual components of a corpus, as further described in section <a class="link_ptr" href="CC.html#CCAS" title="Associating Contextual Information with a Text"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a>.</p><p>In some cases, the design of a corpus is reflected in its internal structure. For example, a corpus of newspaper extracts might be arranged to combine all stories of one type (reportage, editorial, reviews, etc.) into some higher-level grouping, possibly with sub-groups for date, region, etc. The <a class="gi" title="contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more &lt;TEI&gt; elements, each containing a single text header and a text." href="ref-teiCorpus.html">teiCorpus</a> element provides no direct support for reflecting such internal corpus structure in the markup: it treats the corpus as an undifferentiated series of components, each tagged <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a>.</p><p>If it is essential to reflect a single permanent organization of a corpus into sub- and sub-sub-corpora, then the corpus or the high-level subcorpora may be encoded as composite texts, using the <a class="gi" title="contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc." href="ref-group.html">group</a> element described below and in section <a class="link_ptr" href="DS.html#DSGRP" title="Grouped Texts"><span class="headingNumber">4.3.1 </span>Grouped Texts</a>. The mechanisms for corpus characterization described in this chapter, however, are designed to reduce the need to do this. Useful groupings of components may easily be expressed using the text classification and identification elements described in section <a class="link_ptr" href="CC.html#CCAHTD" title="The Text Description"><span class="headingNumber">15.2.1 </span>The Text Description</a>, and those for associating declarations with corpus components described in section <a class="link_ptr" href="CC.html#CCAS" title="Associating Contextual Information with a Text"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a>. These methods also allow several different methods of text grouping to co-exist, each to be used as needed at different times. This helps minimize the danger of cross-classification and misclassification of samples, and helps improve the flexibility with which parts of a corpus may be characterized for different applications.</p><p>Anthologies and collections are often treated as texts in their own right, if only for historical reasons. In conventional publishing, at least, anthologies are published as units, with single editorial responsibility and common front and back matter which may need to be included in their electronic encodings. The texts collected in the anthology, of course, may also need to be identifiable as distinct individual objects for study. </p><p>Poem cycles, epistolary novels, and epistolary essays differ from anthologies in that they are often written as single works, by single authors, for single occasions; nevertheless, it can be useful to treat their constituent parts as individual texts, as well as the cycle itself. Structurally, therefore, they may be treated in the same way as anthologies: in both cases, the body of the text is composed largely of other texts. </p><p>The <a class="gi" title="contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc." href="ref-group.html">group</a> element is provided to simplify the encoding of collections, anthologies, and cyclic works; as noted above, the <a class="gi" title="contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc." href="ref-group.html">group</a> element can also be used to record the potentially complex internal structure of language corpora. For a full description, see chapter <a class="link_ptr" href="DS.html" title="7"><span class="headingNumber">4 </span>Default Text Structure</a>.</p><p>Some composite texts, finally, are neither corpora, nor anthologies, nor cyclic works: they are otherwise unitary texts within which other texts are embedded. In general, they may be treated in the same way as unitary texts, using the normal <a class="gi" title="(TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resourceLike class. Multiple &lt;TEI&gt; elements may be combined to form a &lt;teiCorpus&gt; element." href="ref-TEI.html">TEI</a> and <a class="gi" title="(text body) contains the whole body of a single unitary text, excluding any front or back matter." href="ref-body.html">body</a> elements. The embedded text itself may be encoded using the <a class="gi" title="contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample." href="ref-text.html">text</a> element, which may occur within quotations or between paragraphs or other chunk-level elements inside the sections of a larger text. For further discussion, see chapter <a class="link_ptr" href="DS.html" title="7"><span class="headingNumber">4 </span>Default Text Structure</a>.</p><p>All composite texts share the characteristic that their different component texts may be of structurally similar or dissimilar types. If all component texts may all be encoded using the same module, then no problem arises. If however they require different modules, then these must be included in the schema. This process is described in more detail in section <a class="link_ptr" href="ST.html#STMA" title="TEI Modules"><span class="headingNumber">1.1 </span>TEI Modules</a>.</p></div><div class="div2" id="CCAH"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCDEF"><span class="headingNumber">15.1 </span>Varieties of Composite Text</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAS"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#CCAH" title="link to this section "><span class="invisible">TEI: Contextual Information</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.2 </span><span class="head">Contextual Information</span></h3><p>Contextual information is of particular importance for collections or corpora composed of samples from a variety of different kinds of text. Examples of such contextual information include: the age, sex, and geographical origins of participants in a language interaction, or their socio-economic status; the cost and publication data of a newspaper; the topic, register or factuality of an extract from a textbook. Such information may be of the first importance, whether as an organizing principle in creating a corpus (for example, to ensure that the range of values in such a parameter is evenly represented throughout the corpus, or represented proportionately to the population being sampled), or as a selection criterion in analysing the corpus (for example, to investigate the language usage of some particular vector of social characteristics).</p><p>Such contextual information is potentially of equal importance for unitary texts, and these Guidelines accordingly make no particular distinction between the kinds of information which should be gathered for unitary and for composite texts. In either case, the information should be recorded in the appropriate section of a TEI header, as described in chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>. In the case of language corpora, such information may be gathered together in the overall corpus header, or split across all the component texts of a corpus, in their individual headers, or divided between the two. The association between an individual corpus text and the contextual information applicable to it may be made in a number of ways, as further discussed in section <a class="link_ptr" href="CC.html#CCAS" title="Associating Contextual Information with a Text"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a> below.</p><p>Chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>, which should be read in conjunction with the present section, describes in full the range of elements available for the encoding of information relating to the electronic file itself, for example its bibliographic description and those of the source or sources from which it was derived (see section <a class="link_ptr" href="HD.html#HD2" title="The File Description"><span class="headingNumber">2.2 </span>The File Description</a>); information about the encoding practices followed with the corpus, for example its design principles, editorial practices, reference system, etc. (see section <a class="link_ptr" href="HD.html#HD5" title="The Encoding Description"><span class="headingNumber">2.3 </span>The Encoding Description</a>); more detailed descriptive information about the creation and content of the corpus, such as the languages used within it and any descriptive classification system used (see section <a class="link_ptr" href="HD.html#HD4" title="The Profile Description"><span class="headingNumber">2.4 </span>The Profile Description</a>); and version information documenting any changes made in the electronic text (see section <a class="link_ptr" href="HD.html#HD6" title="The Revision Description"><span class="headingNumber">2.6 </span>The Revision Description</a>).</p><p>In addition to the elements defined by chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>, several other elements can be used in the TEI header if the additional module defined by this chapter is invoked. These additional tags make it possible to characterize the social or other situation within which a language interaction takes place or is experienced, the physical setting of a language interaction, and the participants in it. Though this information may be relevant to, and provided for, unitary texts as well as for collections or corpora, it is more often recorded for the components of systematically developed corpora than for isolated texts, and thus this module is referred to as being <span class="q">‘for language corpora’</span>.</p><p>When the module defined in this chapter is included in a schema, a number of additional elements become available within the <a class="gi" title="(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting." href="ref-profileDesc.html">profileDesc</a> element of the TEI header (discussed in section <a class="link_ptr" href="HD.html#HD4" title="The Profile Description"><span class="headingNumber">2.4 </span>The Profile Description</a>). </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-textDesc.html">textDesc</a></span> (text description) provides a description of a text in terms of its situational parameters.</li><li><span class="specList-elementSpec"><a href="ref-particDesc.html">particDesc</a></span> (participation description) describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata.</li><li><span class="specList-elementSpec"><a href="ref-settingDesc.html">settingDesc</a></span> (setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata.</li></ul><p> These elements, members of the <a class="link_odd" title="groups elements which may be used inside &lt;profileDesc&gt; and appear multiple times." href="ref-model.profileDescPart.html">model.profileDescPart</a>, are discussed in the remainder of the chapter.   </p><div class="div3" id="CCAHTD"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAHPA"><span class="headingNumber">15.2.2 </span>The Participant Description</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAHTD" title="link to this section "><span class="invisible">TEI: The Text Description</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.2.1 </span><span class="head">The Text Description</span></h4><p>The <a class="gi" title="(text description) provides a description of a text in terms of its situational parameters." href="ref-textDesc.html">textDesc</a> element provides a full description of the situation within which a text was produced or experienced, and thus characterizes it in a way relatively independent of any <span class="foreign">a priori</span> theory of text-types. It is provided as an alternative or a supplement to the common use of descriptive taxonomies used to categorize texts, which is fully described in section <a class="link_ptr" href="HD.html#HD43" title="The Text Classification"><span class="headingNumber">2.4.3 </span>The Text Classification</a>, and section <a class="link_ptr" href="HD.html#HD55" title="The Classification Declaration"><span class="headingNumber">2.3.7 </span>The Classification Declaration</a>. The description is organized as a set of values and optional prose descriptions for the following eight <span class="term">situational parameters</span>, each represented by one of the following eight elements: </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-channel.html">channel</a></span> (primary channel) describes the medium or channel by which a text is delivered or experienced. For a written text, this might be print, manuscript, email, etc.; for a spoken one, radio, telephone, face-to-face, etc.<table class="specDesc"><tr><td class="Attribute"><span class="att">mode</span></td><td>specifies the mode of this channel with respect to speech and writing.</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-constitution.html">constitution</a></span> describes the internal composition of a text or text sample, for example as fragmentary, complete, etc.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>specifies how the text was constituted.</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-derivation.html">derivation</a></span> describes the nature and extent of originality of this text.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>categorizes the derivation of the text.
Sample values include: 1] original; 2] revision; 3] translation; 4] abridgment; 5] plagiarism; 6] traditional</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-domain.html">domain</a></span> (domain of use) describes the most important social context in which the text was realized or for which it is intended, for example private vs. public, education, religion, etc.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>categorizes the domain of use.
Sample values include: 1] art; 2] domestic; 3] religious; 4] business; 5] education; 6] govt(government) ; 7] public</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-factuality.html">factuality</a></span> describes the extent to which the text may be regarded as imaginative or non-imaginative, that is, as describing a fictional or a non-fictional world.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>categorizes the factuality of the text.</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-interaction.html">interaction</a></span> describes the extent, cardinality and nature of any interaction among those producing and experiencing the text, for example in the form of response or interjection, commentary, etc.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>specifies the degree of interaction between active and passive participants in the text.</td></tr><tr><td class="Attribute"><span class="att">active</span></td><td>specifies the number of active participants (or <span class="term">addressors</span>) producing parts of the text.
Suggested values include: 1] singular; 2] plural; 3] corporate; 4] unknown</td></tr><tr><td class="Attribute"><span class="att">passive</span></td><td>specifies the number of passive participants (or <span class="term">addressees</span>) to whom a text is directed or in whose presence it is created or performed.
Suggested values include: 1] self; 2] single; 3] many; 4] group; 5] world</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-preparedness.html">preparedness</a></span> describes the extent to which a text may be regarded as prepared or spontaneous.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>a keyword characterizing the type of preparedness.
Sample values include: 1] none; 2] scripted; 3] formulaic; 4] revised</td></tr></table></li><li><span class="specList-elementSpec"><a href="ref-purpose.html">purpose</a></span> characterizes a single purpose or communicative function of the text.<table class="specDesc"><tr><td class="Attribute"><span class="att">type</span></td><td>specifies a particular kind of purpose.
Suggested values include: 1] persuade; 2] express; 3] inform; 4] entertain</td></tr><tr><td class="Attribute"><span class="att">degree</span></td><td>specifies the extent to which this purpose predominates.</td></tr></table></li></ul><p>These elements constitute a model class called <a class="link_odd" title="groups elements used to categorize a text for example in terms of its situational parameters." href="ref-model.textDescPart.html">model.textDescPart</a>; new parameters may be defined by defining new elements and adding them to that class, as further described in <a class="link_ptr" href="USE.html#MD" title="Customization"><span class="headingNumber">23.3 </span>Customization</a>.</p><p>By default, a text description will contain each of the above elements, supplied in the order specified. Except for the <a class="gi" title="characterizes a single purpose or communicative function of the text." href="ref-purpose.html">purpose</a> element, which may be repeated to indicate multiple purposes, no element should appear more than once within a single text description. Each element may be empty, or may contain a brief qualification or more detailed description of the value expressed by its attributes. It should be noted that some texts, in particular literary ones, may resist unambiguous classification in some of these dimensions; in such cases, the situational parameter in question should be given the content <span class="q">‘not applicable’</span> or an equivalent phrase.</p><p>Texts may be described along many dimensions, according to many different taxonomies. No generally accepted consensus as to how such taxonomies should be defined has yet emerged, despite the best efforts of many corpus linguists, text linguists, sociolinguists, rhetoricians, and literary theorists over the years. Rather than attempting the task of proposing a single taxonomy of <span class="term">text-types</span> (or the equally impossible one of enumerating all those which have been proposed previously), the closed set of <span class="term">situational parameters</span> described above can be used in combination to supply useful distinguishing descriptive features of individual texts, without insisting on a system of discrete high-level text-types. Such text-types may however be used in combination with the parameters proposed here, with the advantage that the internal structure of each such text-type can be specified in terms of the parameters proposed. This approach has the following analytical advantages:<span id="Note91_return"><a class="notelink" title="Schemes similar to that proposed here were developed in the 1960s and 1970s by researchers such as Hymes, Halliday, and Crystal and Davy, but have rar…" href="#Note91"><sup>54</sup></a></span> </p><ul class="bulleted"><li class="item">it enables a relatively continuous characterization of texts (in contrast to discrete categories based on type or topic)</li><li class="item">it enables meaningful comparisons across corpora</li><li class="item">it allows analysts to build and compare their own text-types based on the particular parameters of interest to them</li><li class="item">it is equally applicable to spoken, written, or signed texts</li></ul><p>Two alternative approaches to the use of these parameters are supported by these Guidelines. One is to use pre-existing taxonomies such as those used in subject classification or other types of text categorization. Such taxonomies may also be appropriate for the description of the topics addressed by particular texts. Elements for this purpose are described in section <a class="link_ptr" href="HD.html#HD43" title="The Text Classification"><span class="headingNumber">2.4.3 </span>The Text Classification</a>, and elements for defining or declaring such classification schemes in section <a class="link_ptr" href="HD.html#HD55" title="The Classification Declaration"><span class="headingNumber">2.3.7 </span>The Classification Declaration</a>. A second approach is to develop an application-specific set of <span class="term">feature structures</span> and an associated <span class="term">feature system declaration,</span> as described in chapters <a class="link_ptr" href="FS.html" title="16"><span class="headingNumber">18 </span>Feature Structures</a> and <a class="link_ptr" href="FS.html#FD" title="26"><span class="headingNumber">18.11 </span>Feature System Declaration</a>.</p><p>Where the organizing principles of a corpus or collection so permit, it may be convenient to regard a particular set of values for the situational parameters listed in this section as forming a <span class="term">text-type</span> in its own right; this may also be useful where the same set of values applies to several texts within a corpus. In such a case, the set of text-types so defined should be regarded as a <span class="term">taxonomy</span>. The mechanisms described in section <a class="link_ptr" href="HD.html#HD55" title="The Classification Declaration"><span class="headingNumber">2.3.7 </span>The Classification Declaration</a> may be used to define hierarchic taxonomies of such text-types, provided that the <a class="gi" title="(category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal &lt;textDesc&gt;." href="ref-catDesc.html">catDesc</a> component of the <a class="gi" title="contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy." href="ref-category.html">category</a> element contains a <a class="gi" title="(text description) provides a description of a text in terms of its situational parameters." href="ref-textDesc.html">textDesc</a> element rather than a prose description. Particular texts may then be associated with such definitions using the mechanisms described in sections <a class="link_ptr" href="HD.html#HD43" title="The Text Classification"><span class="headingNumber">2.4.3 </span>The Text Classification</a>.</p><div class="p">Using these situational parameters, an informal domestic conversation might be characterized as follows: <div id="index-egXML-d52e115647" class="pre egXML_valid"><span class="element">&lt;textDesc <span class="attribute">n</span>="<span class="attributevalue">Informal domestic conversation</span>"&gt;</span><br /> <span class="element">&lt;channel <span class="attribute">mode</span>="<span class="attributevalue">s</span>"&gt;</span>informal face-to-face conversation<span class="element">&lt;/channel&gt;</span><br /> <span class="element">&lt;constitution <span class="attribute">type</span>="<span class="attributevalue">single</span>"&gt;</span>each text represents a continuously<br />     recorded interaction among the specified participants<br />  <span class="element">&lt;/constitution&gt;</span><br /> <span class="element">&lt;derivation <span class="attribute">type</span>="<span class="attributevalue">original</span>"/&gt;</span><br /> <span class="element">&lt;domain <span class="attribute">type</span>="<span class="attributevalue">domestic</span>"&gt;</span>plans for coming week, local affairs<span class="element">&lt;/domain&gt;</span><br /> <span class="element">&lt;factuality <span class="attribute">type</span>="<span class="attributevalue">mixed</span>"&gt;</span>mostly factual, some jokes<span class="element">&lt;/factuality&gt;</span><br /> <span class="element">&lt;interaction <span class="attribute">type</span>="<span class="attributevalue">complete</span>"<br />  <span class="attribute">active</span>="<span class="attributevalue">plural</span>" <span class="attribute">passive</span>="<span class="attributevalue">many</span>"/&gt;</span><br /> <span class="element">&lt;preparedness <span class="attribute">type</span>="<span class="attributevalue">spontaneous</span>"/&gt;</span><br /> <span class="element">&lt;purpose <span class="attribute">type</span>="<span class="attributevalue">entertain</span>" <span class="attribute">degree</span>="<span class="attributevalue">high</span>"/&gt;</span><br /> <span class="element">&lt;purpose <span class="attribute">type</span>="<span class="attributevalue">inform</span>" <span class="attribute">degree</span>="<span class="attributevalue">medium</span>"/&gt;</span><br /><span class="element">&lt;/textDesc&gt;</span></div></div><div class="p">The following example demonstrates how the same situational parameters might be used to characterize a novel: <div id="index-egXML-d52e115665" class="pre egXML_valid"><span class="element">&lt;textDesc <span class="attribute">n</span>="<span class="attributevalue">novel</span>"&gt;</span><br /> <span class="element">&lt;channel <span class="attribute">mode</span>="<span class="attributevalue">w</span>"&gt;</span>print; part issues<span class="element">&lt;/channel&gt;</span><br /> <span class="element">&lt;constitution <span class="attribute">type</span>="<span class="attributevalue">single</span>"/&gt;</span><br /> <span class="element">&lt;derivation <span class="attribute">type</span>="<span class="attributevalue">original</span>"/&gt;</span><br /> <span class="element">&lt;domain <span class="attribute">type</span>="<span class="attributevalue">art</span>"/&gt;</span><br /> <span class="element">&lt;factuality <span class="attribute">type</span>="<span class="attributevalue">fiction</span>"/&gt;</span><br /> <span class="element">&lt;interaction <span class="attribute">type</span>="<span class="attributevalue">none</span>"/&gt;</span><br /> <span class="element">&lt;preparedness <span class="attribute">type</span>="<span class="attributevalue">prepared</span>"/&gt;</span><br /> <span class="element">&lt;purpose <span class="attribute">type</span>="<span class="attributevalue">entertain</span>" <span class="attribute">degree</span>="<span class="attributevalue">high</span>"/&gt;</span><br /> <span class="element">&lt;purpose <span class="attribute">type</span>="<span class="attributevalue">inform</span>" <span class="attribute">degree</span>="<span class="attributevalue">medium</span>"/&gt;</span><br /><span class="element">&lt;/textDesc&gt;</span></div> </div></div><div class="div3" id="CCAHPA"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAHTD"><span class="headingNumber">15.2.1 </span>The Text Description</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAHSE"><span class="headingNumber">15.2.3 </span>The Setting Description</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAHPA" title="link to this section "><span class="invisible">TEI: The Participant Description</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.2.2 </span><span class="head">The Participant Description</span></h4><p>The <a class="gi" title="(participation description) describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata." href="ref-particDesc.html">particDesc</a> element in the <a class="gi" title="(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting." href="ref-profileDesc.html">profileDesc</a> element provides additional information about the participants in a spoken text or, where this is judged appropriate, the persons named or depicted in a written text. When the detailed elements provided by the <span class="ident-module">namesdates</span> module described in <a class="link_ptr" href="ND.html" title="20"><span class="headingNumber">13 </span>Names, Dates, People, and Places</a> are included in a schema, this element can contain detailed demographic or descriptive information about individual speakers or groups of speakers, such as their names or other personal characteristics. Individually identified persons may also identified by a code which can then be used elsewhere within the encoded text, for example as the value of a <span class="att">who</span> attribute.</p><p>It should be noted that although the terms <span class="term">speaker</span> or <span class="term">participant</span> are used throughout this section, it is intended that the same mechanisms may be used to characterize fictional personæ or ‘voices’ within a written text, except where otherwise stated. For the purposes of analysis of language usage, the information specified here should be equally applicable to written, spoken, or signed texts.</p><p>The element <a class="gi" title="(participation description) describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata." href="ref-particDesc.html">particDesc</a> contains a description of the participants in an interaction, which may be supplied as straightforward prose, possibly containing a list of names, encoded using the usual <a class="gi" title="contains any sequence of items organized as a list." href="ref-list.html">list</a> and <a class="gi" title="(name, proper noun) contains a proper noun or noun phrase." href="ref-name.html">name</a> elements, or alternatively using the more specific and detailed <a class="gi" title="(list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source." href="ref-listPerson.html">listPerson</a> element provided by the <span class="ident-module">namesdates</span> module described in <a class="link_ptr" href="ND.html" title="20"><span class="headingNumber">13 </span>Names, Dates, People, and Places</a>.</p><div class="p">For example, a participant in a recorded conversation might be described informally as follows: <div id="index-egXML-d52e117259" class="pre egXML_valid"><span class="element">&lt;particDesc <span class="attribute">xml:id</span>="<span class="attributevalue">p2</span>"&gt;</span><br /> <span class="element">&lt;p&gt;</span>Female informant, well-educated, born in Shropshire UK, 12 Jan<br />     1950, of unknown occupation. Speaks French fluently.<br />     Socio-Economic status B2 in the PEP classification scheme.<span class="element">&lt;/p&gt;</span><br /><span class="element">&lt;/particDesc&gt;</span></div></div><div class="p">Alternatively, when the <span class="ident-module">namesdates</span> module is included in a schema, information about the same participant described above might be provided in a more structured way as follows: <div id="index-egXML-d52e117268" class="pre egXML_valid"><span class="element">&lt;person <span class="attribute">sex</span>="<span class="attributevalue">2</span>" <span class="attribute">age</span>="<span class="attributevalue">mid</span>"&gt;</span><br /> <span class="element">&lt;birth <span class="attribute">when</span>="<span class="attributevalue">1950-01-12</span>"&gt;</span><br />  <span class="element">&lt;date&gt;</span>12 Jan 1950<span class="element">&lt;/date&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">place</span>"&gt;</span>Shropshire, UK<span class="element">&lt;/name&gt;</span><br /> <span class="element">&lt;/birth&gt;</span><br /> <span class="element">&lt;langKnowledge <span class="attribute">tags</span>="<span class="attributevalue">en fr</span>"&gt;</span><br />  <span class="element">&lt;langKnown <span class="attribute">level</span>="<span class="attributevalue">first</span>" <span class="attribute">tag</span>="<span class="attributevalue">en</span>"&gt;</span>English<span class="element">&lt;/langKnown&gt;</span><br />  <span class="element">&lt;langKnown <span class="attribute">tag</span>="<span class="attributevalue">fr</span>"&gt;</span>French<span class="element">&lt;/langKnown&gt;</span><br /> <span class="element">&lt;/langKnowledge&gt;</span><br /> <span class="element">&lt;residence&gt;</span>Long term resident of Hull<span class="element">&lt;/residence&gt;</span><br /> <span class="element">&lt;education&gt;</span>University postgraduate<span class="element">&lt;/education&gt;</span><br /> <span class="element">&lt;occupation&gt;</span>Unknown<span class="element">&lt;/occupation&gt;</span><br /> <span class="element">&lt;socecStatus <span class="attribute">scheme</span>="<span class="attributevalue">#pep</span>" <span class="attribute">code</span>="<span class="attributevalue">#b2</span>"/&gt;</span><br /><span class="element">&lt;/person&gt;</span></div></div><div class="p">An identified character in a drama or a novel may also be regarded as a participant in this sense, and encoded using the same techniques:<span id="Note92_return"><a class="notelink" title="It is particularly useful to define participants in a dramatic text in this way, since it enables the who attribute to be used to link sp elements to …" href="#Note92"><sup>55</sup></a></span> <div id="index-egXML-d52e117300" class="pre egXML_valid"><span class="element">&lt;particDesc&gt;</span><br /> <span class="element">&lt;p&gt;</span>The chief speaking characters in this novel are<br />  <span class="element">&lt;list&gt;</span><br />   <span class="element">&lt;item <span class="attribute">xml:id</span>="<span class="attributevalue">EMWOO</span>"&gt;</span><br />    <span class="element">&lt;name&gt;</span>Emma Woodhouse<span class="element">&lt;/name&gt;</span><br />   <span class="element">&lt;/item&gt;</span><br />   <span class="element">&lt;item <span class="attribute">xml:id</span>="<span class="attributevalue">DARCY</span>"&gt;</span><br />    <span class="element">&lt;name&gt;</span>Mr Darcy<span class="element">&lt;/name&gt;</span><br />   <span class="element">&lt;/item&gt;</span><br /><span class="comment">&lt;!-- ... --&gt;</span><br />  <span class="element">&lt;/list&gt;</span><span class="element">&lt;/p&gt;</span><br /><span class="element">&lt;/particDesc&gt;</span></div> Here, the characters are simply listed without the detailed structure which use of the <a class="gi" title="(list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source." href="ref-listPerson.html">listPerson</a> element permits.</div></div><div class="div3" id="CCAHSE"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAHPA"><span class="headingNumber">15.2.2 </span>The Participant Description</a></li><li class="subtoc"></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAHSE" title="link to this section "><span class="invisible">TEI: The Setting Description</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.2.3 </span><span class="head">The Setting Description</span></h4><p>The <a class="gi" title="(setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata." href="ref-settingDesc.html">settingDesc</a> element is used to describe the setting or settings in which language interaction takes place. It may contain a prose description, analogous to a stage description at the start of a play, stating in broad terms the locale, or a more detailed description of a series of such settings.</p><p>Each distinct setting is described by means of a <a class="gi" title="describes one particular setting in which a language interaction takes place." href="ref-setting.html">setting</a> element. </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-setting.html">setting</a></span> describes one particular setting in which a language interaction takes place.</li></ul><p> Individual settings may be associated with particular participants by means of the optional <span class="att">who</span> attribute which this element inherits as a member of the <a class="link_odd" title="provides attributes for elements representing speech or action that can be ascribed to a specific individual." href="ref-att.ascribed.html">att.ascribed</a> if, for example, participants are in different places. This attribute identifies one or more individual participants or participant groups, as discussed earlier in section <a class="link_ptr" href="CC.html#CCAHPA" title="The Participant Description"><span class="headingNumber">15.2.2 </span>The Participant Description</a>. If this attribute is not specified, the setting details provided are assumed to apply to all participants represented in the language interaction. Note however that it is not possible to encode different settings for the same participant: a participant is deemed to be a person within a specific setting.</p><p>The <a class="gi" title="describes one particular setting in which a language interaction takes place." href="ref-setting.html">setting</a> element may contain either a prose description or a selection of elements from the classes <a class="link_odd" title="groups elements which contain names of individuals or corporate bodies." href="ref-model.nameLike.agent.html">model.nameLike.agent</a>, <a class="link_odd" title="groups elements containing temporal expressions." href="ref-model.dateLike.html">model.dateLike</a>, or <a class="link_odd" title="groups elements used to describe the setting of a linguistic interaction." href="ref-model.settingPart.html">model.settingPart</a>. By default, when the module defined by this chapter is included in a schema, these classes thus provide the following elements: </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-name.html">name</a></span> (name, proper noun) contains a proper noun or noun phrase.</li><li><span class="specList-elementSpec"><a href="ref-date.html">date</a></span> contains a date in any format.</li><li><span class="specList-elementSpec"><a href="ref-time.html">time</a></span> contains a phrase defining a time of day in any format.</li><li><span class="specList-elementSpec"><a href="ref-locale.html">locale</a></span> contains a brief informal description of the kind of place concerned, for example: a room, a restaurant, a park bench, etc.</li><li><span class="specList-elementSpec"><a href="ref-activity.html">activity</a></span> contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.</li></ul><p> Additional more specific naming elements such as <a class="gi" title="(organization name) contains an organizational name." href="ref-orgName.html">orgName</a> or <a class="gi" title="(personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc." href="ref-persName.html">persName</a> may also be available if the <span class="ident-module">namesdates</span> module is also included in the schema.</p><div class="p">The following example demonstrates the kind of background information often required to support transcriptions of language interactions, first encoded as a simple prose narrative: <div id="index-egXML-d52e117373" class="pre egXML_valid"><span class="element">&lt;settingDesc&gt;</span><br /> <span class="element">&lt;p&gt;</span>The time is early spring, 1989. P1 and P2 are playing on the rug<br />     of a suburban home in Bedford. P3 is doing the washing up at the<br />     sink. P4 (a radio announcer) is in a broadcasting studio in<br />     London.<span class="element">&lt;/p&gt;</span><br /><span class="element">&lt;/settingDesc&gt;</span></div> The same information might be represented more formally in the following way: <div id="index-egXML-d52e117378" class="pre egXML_valid"><span class="element">&lt;settingDesc&gt;</span><br /> <span class="element">&lt;setting <span class="attribute">who</span>="<span class="attributevalue">#p1 #p2</span>"&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">city</span>"&gt;</span>Bedford<span class="element">&lt;/name&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">region</span>"&gt;</span>UK: South East<span class="element">&lt;/name&gt;</span><br />  <span class="element">&lt;date&gt;</span>early spring, 1989<span class="element">&lt;/date&gt;</span><br />  <span class="element">&lt;locale&gt;</span>rug of a suburban home<span class="element">&lt;/locale&gt;</span><br />  <span class="element">&lt;activity&gt;</span>playing<span class="element">&lt;/activity&gt;</span><br /> <span class="element">&lt;/setting&gt;</span><br /> <span class="element">&lt;setting <span class="attribute">who</span>="<span class="attributevalue">#p3</span>"&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">city</span>"&gt;</span>Bedford<span class="element">&lt;/name&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">region</span>"&gt;</span>UK: South East<span class="element">&lt;/name&gt;</span><br />  <span class="element">&lt;date&gt;</span>early spring, 1989<span class="element">&lt;/date&gt;</span><br />  <span class="element">&lt;locale&gt;</span>at the sink<span class="element">&lt;/locale&gt;</span><br />  <span class="element">&lt;activity&gt;</span>washing-up<span class="element">&lt;/activity&gt;</span><br /> <span class="element">&lt;/setting&gt;</span><br /> <span class="element">&lt;setting <span class="attribute">who</span>="<span class="attributevalue">#p4</span>"&gt;</span><br />  <span class="element">&lt;name <span class="attribute">type</span>="<span class="attributevalue">place</span>"&gt;</span>London, UK<span class="element">&lt;/name&gt;</span><br />  <span class="element">&lt;time&gt;</span>unknown<span class="element">&lt;/time&gt;</span><br />  <span class="element">&lt;locale&gt;</span>broadcasting studio<span class="element">&lt;/locale&gt;</span><br />  <span class="element">&lt;activity&gt;</span>radio performance<span class="element">&lt;/activity&gt;</span><br /> <span class="element">&lt;/setting&gt;</span><br /><span class="element">&lt;/settingDesc&gt;</span></div></div><p>Again, a more detailed encoding for places is feasible if the <span class="ident-module">namesdates</span> module is included in the schema. The above examples assume that only the general purpose <a class="gi" title="(name, proper noun) contains a proper noun or noun phrase." href="ref-name.html">name</a> element supplied in the core module is available. </p></div></div><div class="div2" id="CCAS"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAH"><span class="headingNumber">15.2 </span>Contextual Information</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAN"><span class="headingNumber">15.4 </span>Linguistic Annotation of Corpora</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#CCAS" title="link to this section "><span class="invisible">TEI: Associating Contextual
Information with a Text</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.3 </span><span class="head">Associating Contextual Information with a Text</span></h3><p>This section discusses the association of the contextual information held in the header with the individual elements making up a TEI text or corpus. Contextual information is held in elements of various kinds within the TEI header, as discussed elsewhere in this section and in chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>. Here we consider what happens when different parts of a document need to be associated with different contextual information of the same type, for example when one part of a document uses a different encoding practice from another, or where one part relates to a different setting from another. In such situations, there will be more than one instance of a header element of the relevant type.</p><p>The TEI scheme allow for the following possibilities: </p><ul class="bulleted"><li class="item">A given element may appear in the corpus header only, in the header of one or more texts only, or in both places</li><li class="item">There may be multiple occurrences of certain elements in either corpus or text header.</li></ul><p>To simplify the exposition, we deal with these two possibilities separately in what follows; however, they may be combined as desired. </p><div class="div3" id="CCAS1"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAS2"><span class="headingNumber">15.3.2 </span>Declarable Elements</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAS1" title="link to this section "><span class="invisible">TEI: Combining Corpus and Text Headers</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.3.1 </span><span class="head">Combining Corpus and Text Headers</span></h4><p>A TEI-conformant document may have more than one header only in the case of a TEI corpus, which must have a header in its own right, as well as the obligatory header for each text. Every element specified in a corpus-header is understood as if it appeared within every text header in the corpus. An element specified in a text header but not in the corpus header supplements the specification for that text alone. If any element is specified in both corpus and text headers, the corpus header element is over-ridden for that text alone. </p><p>The <a class="gi" title="(title statement) groups information about the title of a work and those responsible for its content." href="ref-titleStmt.html">titleStmt</a> for a corpus text is understood to be prefixed by the <a class="gi" title="(title statement) groups information about the title of a work and those responsible for its content." href="ref-titleStmt.html">titleStmt</a> given in the corpus header. All other optional elements of the <a class="gi" title="(file description) contains a full bibliographic description of an electronic file." href="ref-fileDesc.html">fileDesc</a> should be omitted from an individual corpus text header unless they differ from those specified in the corpus header. All other header elements behave identically, in the manner documented below. This facility makes it possible to state once for all in the corpus header each piece of contextual information which is common to the whole of the corpus, while still allowing for individual texts to vary from this common denominator.</p><div class="p">For example, the following schematic shows the structure of a corpus comprising three texts, the first and last of which share the same encoding description. The second one has its own encoding description. <div id="index-egXML-d52e117641" class="pre egXML_feasible"><span class="element">&lt;teiCorpus xmlns="http://www.tei-c.org/ns/1.0"&gt;</span><br /> <span class="element">&lt;teiHeader&gt;</span><br />  <span class="element">&lt;fileDesc&gt;</span><br /><span class="comment">&lt;!-- corpus file description--&gt;</span><br />  <span class="element">&lt;/fileDesc&gt;</span><br />  <span class="element">&lt;encodingDesc&gt;</span><br /><span class="comment">&lt;!-- default encoding description --&gt;</span><br />  <span class="element">&lt;/encodingDesc&gt;</span><br />  <span class="element">&lt;revisionDesc&gt;</span><br /><span class="comment">&lt;!-- corpus revision description --&gt;</span><br />  <span class="element">&lt;/revisionDesc&gt;</span><br /> <span class="element">&lt;/teiHeader&gt;</span><br /> <span class="element">&lt;TEI&gt;</span><br />  <span class="element">&lt;teiHeader&gt;</span><br />   <span class="element">&lt;fileDesc&gt;</span><br /><span class="comment">&lt;!-- file description for this corpus text --&gt;</span><br />   <span class="element">&lt;/fileDesc&gt;</span><br />  <span class="element">&lt;/teiHeader&gt;</span><br />  <span class="element">&lt;text&gt;</span><br /><span class="comment">&lt;!-- first corpus text --&gt;</span><br />  <span class="element">&lt;/text&gt;</span><br /> <span class="element">&lt;/TEI&gt;</span><br /> <span class="element">&lt;TEI&gt;</span><br />  <span class="element">&lt;teiHeader&gt;</span><br />   <span class="element">&lt;fileDesc&gt;</span><br /><span class="comment">&lt;!-- file description for this corpus text --&gt;</span><br />   <span class="element">&lt;/fileDesc&gt;</span><br />   <span class="element">&lt;encodingDesc&gt;</span><br /><span class="comment">&lt;!-- encoding description for this corpus 
             text, over-riding the default  --&gt;</span><br />   <span class="element">&lt;/encodingDesc&gt;</span><br />  <span class="element">&lt;/teiHeader&gt;</span><br />  <span class="element">&lt;text&gt;</span><br /><span class="comment">&lt;!-- second corpus text --&gt;</span><br />  <span class="element">&lt;/text&gt;</span><br /> <span class="element">&lt;/TEI&gt;</span><br /> <span class="element">&lt;TEI&gt;</span><br />  <span class="element">&lt;teiHeader&gt;</span><br />   <span class="element">&lt;fileDesc&gt;</span><br /><span class="comment">&lt;!-- file description for third corpus text --&gt;</span><br />   <span class="element">&lt;/fileDesc&gt;</span><br />  <span class="element">&lt;/teiHeader&gt;</span><br />  <span class="element">&lt;text&gt;</span><br /><span class="comment">&lt;!-- third corpus text --&gt;</span><br />  <span class="element">&lt;/text&gt;</span><br /> <span class="element">&lt;/TEI&gt;</span><br /><span class="element">&lt;/teiCorpus&gt;</span></div></div></div><div class="div3" id="CCAS2"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAS1"><span class="headingNumber">15.3.1 </span>Combining Corpus and Text Headers</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCAS3"><span class="headingNumber">15.3.3 </span>Summary</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAS2" title="link to this section "><span class="invisible">TEI: Declarable Elements</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.3.2 </span><span class="head">Declarable Elements</span></h4><p>Certain of the elements which can appear within a TEI header are known as <span class="term">declarable elements</span>. These elements have in common the fact that they may be linked explicitly with a particular part of a text or corpus by means of a <span class="att">decls</span> attribute on that element. This linkage is used to over-ride the default association between declarations in the header and a corpus or corpus text. The only header elements which may be associated in this way are those which would not otherwise be meaningfully repeatable.</p><p>Declarable elements are all members of the class <a class="link_odd" title="provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose @decls attribute." href="ref-att.declarable.html">att.declarable</a>; the corresponding declaring elements are all members of the class <a class="link_odd" title="provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element." href="ref-att.declaring.html">att.declaring</a>. </p><ul class="specList"><li><span class="specList-classSpec"><a href="ref-att.declarable.html">att.declarable</a></span> provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose <span class="att">decls</span> attribute.<table class="specDesc"><tr><td class="Attribute"><span class="att">default</span></td><td>indicates whether or not this element is selected by default when its parent is selected.</td></tr></table></li><li><span class="specList-classSpec"><a href="ref-att.declaring.html">att.declaring</a></span> provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element.<table class="specDesc"><tr><td class="Attribute"><span class="att">decls</span></td><td>identifies one or more <span class="term">declarable elements</span> within the header, which are understood to apply to the element bearing this attribute and its content.</td></tr></table></li></ul><p>An alphabetically ordered list of declarable elements follows: </p><ul class="specList"><li><span class="specList-elementSpec"><a href="ref-availability.html">availability</a></span> supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc.</li><li><span class="specList-elementSpec"><a href="ref-bibl.html">bibl</a></span> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.</li><li><span class="specList-elementSpec"><a href="ref-biblFull.html">biblFull</a></span> (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.</li><li><span class="specList-elementSpec"><a href="ref-biblStruct.html">biblStruct</a></span> (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order.</li><li><span class="specList-elementSpec"><a href="ref-broadcast.html">broadcast</a></span> describes a broadcast used as the source of a spoken text.</li><li><span class="specList-elementSpec"><a href="ref-correction.html">correction</a></span> (correction principles) states how and under what circumstances corrections have been made in the text.</li><li><span class="specList-elementSpec"><a href="ref-editorialDecl.html">editorialDecl</a></span> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.</li><li><span class="specList-elementSpec"><a href="ref-equipment.html">equipment</a></span> provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text.</li><li><span class="specList-elementSpec"><a href="ref-hyphenation.html">hyphenation</a></span> summarizes the way in which hyphenation in a source text has been treated in an encoded version of it.</li><li><span class="specList-elementSpec"><a href="ref-interpretation.html">interpretation</a></span> describes the scope of any analytic or interpretive information added to the text in addition to the transcription.</li><li><span class="specList-elementSpec"><a href="ref-langUsage.html">langUsage</a></span> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text.</li><li><span class="specList-elementSpec"><a href="ref-listBibl.html">listBibl</a></span> (citation list) contains a list of bibliographic citations of any kind.</li><li><span class="specList-elementSpec"><a href="ref-normalization.html">normalization</a></span> indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form.</li><li><span class="specList-elementSpec"><a href="ref-particDesc.html">particDesc</a></span> (participation description) describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata.</li><li><span class="specList-elementSpec"><a href="ref-projectDesc.html">projectDesc</a></span> (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.</li><li><span class="specList-elementSpec"><a href="ref-quotation.html">quotation</a></span> specifies editorial practice adopted with respect to quotation marks in the original.</li><li><span class="specList-elementSpec"><a href="ref-recording.html">recording</a></span> (recording event) provides details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast.</li><li><span class="specList-elementSpec"><a href="ref-samplingDecl.html">samplingDecl</a></span> (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.</li><li><span class="specList-elementSpec"><a href="ref-scriptStmt.html">scriptStmt</a></span> (script statement) contains a citation giving details of the script used for a spoken text.</li><li><span class="specList-elementSpec"><a href="ref-segmentation.html">segmentation</a></span> describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc.</li><li><span class="specList-elementSpec"><a href="ref-sourceDesc.html">sourceDesc</a></span> (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.</li><li><span class="specList-elementSpec"><a href="ref-stdVals.html">stdVals</a></span> (standard values) specifies the format used when standardized date or number values are supplied.</li><li><span class="specList-elementSpec"><a href="ref-textClass.html">textClass</a></span> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.</li><li><span class="specList-elementSpec"><a href="ref-textDesc.html">textDesc</a></span> (text description) provides a description of a text in terms of its situational parameters.</li><li><span class="specList-elementSpec"><a href="ref-xenoData.html">xenoData</a></span> (non-TEI metadata) provides a container element into which metadata in non-TEI formats may be placed.</li></ul><p> All of the above elements may be multiply defined within a single header, that is, there may be more than one instance of any declarable element type at a given level. When this occurs, the following rules apply: </p><ul class="bulleted"><li class="item">every declarable element must bear a unique identifier</li><li class="item">for each different type of declarable element which occurs more than once within the same parent element, exactly one element must be specified as the default, by means of the <span class="att">default</span> attribute</li></ul><div class="p">In the following example, an editorial declaration contains two possible <a class="gi" title="(correction principles) states how and under what circumstances corrections have been made in the text." href="ref-correction.html">correction</a> policies, one identified as <span class="val">CorPol1</span> and the other as <span class="val">CorPol2</span>. Since there are two, one of them (in this case <span class="val">CorPol1</span>) must be specified as the default: <div id="index-egXML-d52e117747" class="pre egXML_valid"><span class="element">&lt;editorialDecl&gt;</span><br /> <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">CorPol1</span>"<br />  <span class="attribute">default</span>="<span class="attributevalue">true</span>"&gt;</span><br />  <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br /> <span class="element">&lt;/correction&gt;</span><br /> <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">CorPol2</span>"&gt;</span><br />  <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br /> <span class="element">&lt;/correction&gt;</span><br /> <span class="element">&lt;normalization <span class="attribute">xml:id</span>="<span class="attributevalue">n1</span>"&gt;</span><br />  <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br /> <span class="element">&lt;/normalization&gt;</span><br /><span class="element">&lt;/editorialDecl&gt;</span></div> For texts associated with the header in which this declaration appears, correction method <span class="val">CorPol1</span> will be assumed, unless they explicitly state otherwise. Here is the structure for a text which does state otherwise: <div id="index-egXML-d52e117765" class="pre egXML_valid"><span class="element">&lt;text&gt;</span><br /> <span class="element">&lt;body&gt;</span><br />  <span class="element">&lt;div1 <span class="attribute">n</span>="<span class="attributevalue">d1</span>"/&gt;</span><br />  <span class="element">&lt;div1 <span class="attribute">n</span>="<span class="attributevalue">d2</span>" <span class="attribute">decls</span>="<span class="attributevalue">#CorPol2</span>"/&gt;</span><br />  <span class="element">&lt;div1 <span class="attribute">n</span>="<span class="attributevalue">d3</span>"/&gt;</span><br /> <span class="element">&lt;/body&gt;</span><br /><span class="element">&lt;/text&gt;</span></div> In this case, the contents of the divisions D1 and D3 will both use correction policy <span class="val">CorPol1</span>, and those of division D2 will use correction policy <span class="val">CorPol2</span>.</div><p>The <span class="att">decls</span> attribute is defined for any element which is a member of the class <span class="term">declaring</span>. This includes the major structural elements <a class="gi" title="contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample." href="ref-text.html">text</a>, <a class="gi" title="contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc." href="ref-group.html">group</a>, and <a class="gi" title="(text division) contains a subdivision of the front, body, or back of a text." href="ref-div.html">div</a>, as well as smaller structural units, down to the level of paragraphs in prose, individual utterances in spoken texts, and entries in dictionaries. However, TEI recommended practice is to limit the number of multiple declarable elements used by a document as far as possible, for simplicity and ease of processing.</p><p>The identifier or identifiers specified by the <span class="att">decls</span> attribute are subject to two further restrictions: </p><ul class="bulleted"><li class="item">An identifier specifying an element which contains multiple instances of one or more other elements should be interpreted as if it explicitly identified the elements identified as the default in each such set of repeated elements</li><li class="item">Each element specified, explicitly or implicitly, by the list of identifiers must be of a different kind.</li></ul><div class="p">To demonstrate how these rules operate, we now expand our earlier example slightly: <div id="index-egXML-d52e117808" class="pre egXML_valid"><span class="element">&lt;encodingDesc&gt;</span><br /> <span class="element">&lt;editorialDecl <span class="attribute">xml:id</span>="<span class="attributevalue">ED1</span>" <span class="attribute">default</span>="<span class="attributevalue">true</span>"&gt;</span><br />  <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">C1A</span>" <span class="attribute">default</span>="<span class="attributevalue">true</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/correction&gt;</span><br />  <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">C1B</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/correction&gt;</span><br />  <span class="element">&lt;normalization <span class="attribute">xml:id</span>="<span class="attributevalue">N1</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/normalization&gt;</span><br /> <span class="element">&lt;/editorialDecl&gt;</span><br /> <span class="element">&lt;editorialDecl <span class="attribute">xml:id</span>="<span class="attributevalue">ED2</span>"&gt;</span><br />  <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">C2A</span>" <span class="attribute">default</span>="<span class="attributevalue">true</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/correction&gt;</span><br />  <span class="element">&lt;correction <span class="attribute">xml:id</span>="<span class="attributevalue">C2B</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/correction&gt;</span><br />  <span class="element">&lt;normalization <span class="attribute">xml:id</span>="<span class="attributevalue">N2A</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/normalization&gt;</span><br />  <span class="element">&lt;normalization <span class="attribute">xml:id</span>="<span class="attributevalue">N2B</span>"<br />   <span class="attribute">default</span>="<span class="attributevalue">true</span>"&gt;</span><br />   <span class="element">&lt;p&gt;</span> ... <span class="element">&lt;/p&gt;</span><br />  <span class="element">&lt;/normalization&gt;</span><br /> <span class="element">&lt;/editorialDecl&gt;</span><br /><span class="element">&lt;/encodingDesc&gt;</span></div></div><p>This encoding description now has two editorial declarations, identified as <span class="val">ED1</span> (the default) and <span class="val">ED2</span>. For texts not specifying otherwise, <span class="val">ED1</span> will apply. If <span class="val">ED1</span> applies, correction method C1A and normalization method N1 apply, since these are the specified defaults within <span class="val">ED1</span>. In the same way, for a text specifying <span class="att">decls</span> as <span class="q">‘<span class="val">ED2</span>’</span>, correction C2A, and normalization N2B will apply.</p><p>A finer grained approach is also possible. A text might specify <span class="tag">&lt;text decls='C2B N2A'&gt;</span>, to ‘mix and match’ declarations as required. A tag such as <span class="tag">&lt;text decls='ED1 ED2'&gt;</span> would (obviously) be illegal, since it includes two elements of the same type; a tag such as <span class="tag">&lt;text decls='ED2 C1A'&gt;</span> is also illegal, since in this context <span class="val">ED2</span> is synonymous with the defaults for that editorial declaration, namely <span class="val">C2A N2B</span>, resulting in a list that identifies two correction elements (C1A and C2A).</p></div><div class="div3" id="CCAS3"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAS2"><span class="headingNumber">15.3.2 </span>Declarable Elements</a></li><li class="subtoc"></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAS3" title="link to this section "><span class="invisible">TEI: Summary</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.3.3 </span><span class="head">Summary</span></h4><p>The rules determining which of the declarable elements are applicable at any point may be summarized as follows: </p><ol class="numbered"><li class="item">If there is a single occurrence of a given declarable element in a corpus header, then it applies by default to all elements within the corpus.</li><li class="item">If there is a single occurrence of a given declarable element in the text header, then it applies by default to all elements of that text irrespective of the contents of the corpus header.</li><li class="item">Where there are multiple occurrences of declarable elements within either corpus or text header, <ul class="bulleted"><li class="item">each must have a unique value specified as the value of its <span class="att">xml:id</span> attribute;</li><li class="item">one only must bear a <span class="att">default</span> attribute with the value <span class="val">YES</span>.</li></ul></li><li class="item">It is a semantic error for an element to be associated with more than one occurrence of any declarable element.</li><li class="item">Selecting an element which contains multiple occurrences of a given declarable element is semantically equivalent to selecting only those contained elements which are specified as defaults.</li><li class="item">An association made by one element applies by default to all of its descendants. </li></ol></div></div><div class="div2" id="CCAN"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAS"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#CCREC"><span class="headingNumber">15.5 </span>Recommendations for the Encoding of Large Corpora</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#CCAN" title="link to this section "><span class="invisible">TEI: Linguistic Annotation of Corpora</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.4 </span><span class="head">Linguistic Annotation of Corpora</span></h3><p>Language corpora often include analytic encodings or annotations, designed to support a variety of different views of language. The present Guidelines do not advocate any particular approach to linguistic annotation (or ‘tagging’); instead a number of general analytic facilities are provided which support the representation of most forms of annotation in a standard and self-documenting manner. Analytic annotation is of importance in many fields, not only in corpus linguistics, and is therefore discussed in general terms elsewhere in the Guidelines.<span id="Note93_return"><a class="notelink" title="See in particular chapters , , and ." href="#Note93"><sup>56</sup></a></span> The present section presents informally some particular applications of these general mechanisms to the specific practice of corpus linguistics.</p><div class="div3" id="CCAN1"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"></li><li class="subtoc"></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h4><span class="bookmarklink"><a class="bookmarklink" href="#CCAN1" title="link to this section "><span class="invisible">TEI: Levels of Analysis</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.4.1 </span><span class="head">Levels of Analysis</span></h4><p>By <span class="term">linguistic annotation</span> we mean here any annotation determined by an analysis of linguistic features of the text, excluding as borderline cases both the formal structural properties of the text (e.g. its division into chapters or paragraphs) and descriptive information about its context (the circumstances of its production, its genre, or medium). The structural properties of any TEI-conformant text should be represented using the structural elements discussed elsewhere in these Guidelines, for example in chapters <a class="link_ptr" href="CO.html" title="6"><span class="headingNumber">3 </span>Elements Available in All TEI Documents</a> and <a class="link_ptr" href="DS.html" title="7"><span class="headingNumber">4 </span>Default Text Structure</a>. The contextual properties of a TEI text are fully documented in the TEI header, which is discussed in chapter <a class="link_ptr" href="HD.html" title="5"><span class="headingNumber">2 </span>The TEI Header</a>, and in section <a class="link_ptr" href="CC.html#CCAH" title="Contextual Information"><span class="headingNumber">15.2 </span>Contextual Information</a> of the present chapter.</p><p>Other forms of linguistic annotation may be applied at a number of levels in a text. A code (such as a word-class or part-of-speech code) may be associated with each word or token, or with groups of such tokens, which may be continuous, discontinuous, or nested. A code may also be associated with relationships (such as cohesion) perceived as existing between distinct parts of a text. The codes themselves may stand for discrete non-decomposable categories, or they may represent highly articulated bundles of textual features. Their function may be to place the annotated part of the text somewhere within a narrowly linguistic or discoursal domain of analysis, or within a more general semantic field, or any combination drawn from these and other domains. </p><p>The manner by which such annotations are generated and attached to the text may be entirely automatic, entirely manual, or a mixture. The ease and accuracy with which analysis may be automated may vary with the level at which the annotation is attached. The method employed should be documented in the <a class="gi" title="describes the scope of any analytic or interpretive information added to the text in addition to the transcription." href="ref-interpretation.html">interpretation</a> element within the encoding description of the TEI header, as described in section <a class="link_ptr" href="HD.html#HD53" title="The Editorial Practices Declaration"><span class="headingNumber">2.3.3 </span>The Editorial Practices Declaration</a>. Where different parts of a corpus have used different annotation methods, the <span class="att">decls</span> attribute may be used to indicate the fact, as further discussed in section <a class="link_ptr" href="CC.html#CCAS" title="Associating Contextual Information with a Text"><span class="headingNumber">15.3 </span>Associating Contextual Information with a Text</a>.</p><p>An extended example of one form of linguistic analysis commonly practised in corpus linguistics is given in section <a class="link_ptr" href="AI.html#AILA" title="Linguistic Annotation"><span class="headingNumber">17.4 </span>Linguistic Annotation</a>.</p></div></div><div class="div2" id="CCREC"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCAN"><span class="headingNumber">15.4 </span>Linguistic Annotation of Corpora</a></li><li class="subtoc"><span class="nextLink"> » </span><a class="navigation" href="CC.html#index-body.1_div.15_div.6"><span class="headingNumber">15.6 </span>Module for Language Corpora</a></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#CCREC" title="link to this section "><span class="invisible">TEI: Recommendations for the Encoding of Large Corpora</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.5 </span><span class="head">Recommendations for the Encoding of Large Corpora</span></h3><p>These Guidelines include proposals for the identification and encoding of a far greater variety of textual features and characteristics than is likely to be either feasible or desirable in any one language corpus, however large and ambitious. The reasoning behind this catholic approach is further discussed in chapter <a class="link_ptr" href="AB.html" title="1"><span class="headingNumber">iv. </span>About These Guidelines</a>. For most large-scale corpus projects, it will therefore be necessary to determine a subset of TEI recommended elements appropriate to the anticipated needs of the project, as further discussed in chapter <a class="link_ptr" href="USE.html#MD" title="Customization"><span class="headingNumber">23.3 </span>Customization</a>; these mechanisms include the ability to exclude selected element types, add new element types, and change the names of existing elements. A discussion of the implications of such changes for TEI conformance is provided in chapter <a class="link_ptr" href="USE.html#CF" title="Conformance"><span class="headingNumber">23.4 </span>Conformance</a>.</p><p>Because of the high cost of identifying and encoding many textual features, and the difficulty in ensuring consistent practice across very large corpora, encoders may find it convenient to divide the set of elements to be encoded into the following four categories: </p><dl><dt><span>required</span></dt><dd>texts included within the corpus will always encode textual features in this category, should they exist in the text</dd><dt><span>recommended</span></dt><dd>textual features in this category will be encoded wherever economically and practically feasible; where present but not encoded, a note in the header should be made.</dd><dt><span>optional</span></dt><dd>textual features in this category may or may not be encoded; no conclusion about the absence of such features can be inferred from the absence of the corresponding element in a given text.</dd><dt><span>proscribed</span></dt><dd>textual features in this category are deliberately not encoded; they may be transcribed as unmarked up text, or represented as <a class="gi" title="indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible." href="ref-gap.html">gap</a> elements, or silently omitted, as appropriate.</dd></dl></div><div class="teidiv1" id="index-body.1_div.15_div.6"><div class="miniTOC miniTOC_right"><ul class="subtoc"><li class="subtoc"><span class="previousLink"> « </span><a class="navigation" href="CC.html#CCREC"><span class="headingNumber">15.5 </span>Recommendations for the Encoding of Large Corpora</a></li><li class="subtoc"></li><li class="subtoc"><a class="navigation" href="index.html">Home</a></li></ul></div><h3><span class="bookmarklink"><a class="bookmarklink" href="#index-body.1_div.15_div.6" title="link to this section "><span class="invisible">TEI: Module for Language Corpora</span><span class="pilcrow">¶</span></a></span><span class="headingNumber">15.6 </span><span class="head">Module for Language Corpora</span></h3><p>The module described in this chapter makes available the following components: </p><dl class="moduleSpec"><dt class="moduleSpecHead"><span lang="en">Module</span> corpus: Corpus texts</dt><dd><ul><li><span lang="en">Elements defined</span>: <a class="link_odd" title="contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything." href="ref-activity.html">activity</a> <a class="link_odd" title="(primary channel) describes the medium or channel by which a text is delivered or experienced. For a written text, this might be print, manuscript, email, etc.; for a spoken one, radio, telephone, face-to-face, etc." href="ref-channel.html">channel</a> <a class="link_odd" title="describes the internal composition of a text or text sample, for example as fragmentary, complete, etc." href="ref-constitution.html">constitution</a> <a class="link_odd" title="describes the nature and extent of originality of this text." href="ref-derivation.html">derivation</a> <a class="link_odd" title="(domain of use) describes the most important social context in which the text was realized or for which it is intended, for example private vs. public, education, religion, etc." href="ref-domain.html">domain</a> <a class="link_odd" title="describes the extent to which the text may be regarded as imaginative or non-imaginative, that is, as describing a fictional or a non-fictional world." href="ref-factuality.html">factuality</a> <a class="link_odd" title="describes the extent, cardinality and nature of any interaction among those producing and experiencing the text, for example in the form of response or interjection, commentary, etc." href="ref-interaction.html">interaction</a> <a class="link_odd" title="contains a brief informal description of the kind of place concerned, for example: a room, a restaurant, a park bench, etc." href="ref-locale.html">locale</a> <a class="link_odd" title="(participation description) describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata." href="ref-particDesc.html">particDesc</a> <a class="link_odd" title="describes the extent to which a text may be regarded as prepared or spontaneous." href="ref-preparedness.html">preparedness</a> <a class="link_odd" title="characterizes a single purpose or communicative function of the text." href="ref-purpose.html">purpose</a> <a class="link_odd" title="describes one particular setting in which a language interaction takes place." href="ref-setting.html">setting</a> <a class="link_odd" title="(setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata." href="ref-settingDesc.html">settingDesc</a> <a class="link_odd" title="(text description) provides a description of a text in terms of its situational parameters." href="ref-textDesc.html">textDesc</a></li></ul></dd></dl><p> The selection and combination of modules to form a TEI schema is described in <a class="link_ptr" href="ST.html#STIN" title="Defining a TEI Schema"><span class="headingNumber">1.2 </span>Defining a TEI Schema</a>.</p></div></div><nav class="left"><span class="upLink"> ↑ </span><a class="navigation" href="index.html">TEI P5 Guidelines</a><span class="previousLink"> « </span><a class="navigation" href="FT.html"><span class="headingNumber">14 </span>Tables, Formulæ, Graphics and Notated Music</a><span class="nextLink"> » </span><a class="navigation" href="SA.html"><span class="headingNumber">16 </span>Linking, Segmentation, and Alignment</a></nav><!--Notes in [div]--><div class="notes"><div class="noteHeading">Notes</div><div class="note" id="Note91"><span class="noteLabel">54 </span><div class="noteBody">Schemes similar to that proposed here were developed in the 1960s and 1970s by researchers such as Hymes, Halliday, and Crystal and Davy, but have rarely been implemented; one notable exception being the pioneering work on the Helsinki Diachronic Corpus of English, on which see <a class="citlink" href="BIB.html#CC-BIBL-1">Kytö and Rissanen (1988)</a></div> <a class="link_return" title="Go back to text" href="#Note91_return">↵</a></div><div class="note" id="Note92"><span class="noteLabel">55 </span><div class="noteBody">It is particularly useful to define participants in a dramatic text in this way, since it enables the <span class="att">who</span> attribute to be used to link <a class="gi" title="(speech) contains an individual speech in a performance text, or a passage presented as such in a prose or verse text." href="ref-sp.html">sp</a> elements to definitions for their speakers; see further section <a class="link_ptr" href="DR.html#DRSP" title="Speeches and Speakers"><span class="headingNumber">7.2.2 </span>Speeches and Speakers</a>.</div> <a class="link_return" title="Go back to text" href="#Note92_return">↵</a></div><div class="note" id="Note93"><span class="noteLabel">56 </span><div class="noteBody">See in particular chapters <a class="link_ptr" href="SA.html" title="14"><span class="headingNumber">16 </span>Linking, Segmentation, and Alignment</a>, <a class="link_ptr" href="AI.html" title="15"><span class="headingNumber">17 </span>Simple Analytic Mechanisms</a>, and <a class="link_ptr" href="FS.html" title="16"><span class="headingNumber">18 </span>Feature Structures</a>.</div> <a class="link_return" title="Go back to text" href="#Note93_return">↵</a></div></div><div class="stdfooter autogenerated"><p>
    [<a href="../../en/html/CC.html">English</a>]
    [<a href="../../de/html/CC.html">Deutsch</a>]
    [<a href="../../es/html/CC.html">Español</a>]
    [<a href="../../it/html/CC.html">Italiano</a>]
    [<a href="../../fr/html/CC.html">Français</a>]
    [<a href="../../ja/html/CC.html">日本語</a>]
    [<a href="../../ko/html/CC.html">한국어</a>]
    [<a href="../../zh-TW/html/CC.html">中文</a>]
    </p><hr /><div class="footer"><a class="plain" href="http://www.tei-c.org/About/">TEI Consortium</a> | <a class="plain" href="http://www.tei-c.org/About/contact.xml">Feedback</a></div><hr /><address><br />TEI Guidelines <a class="link_ref" href="AB.html#ABTEI4">Version</a> <a class="link_ref" href="../../readme-3.1.1.html">3.1.1a</a>. Last updated on <span class="date">10th May 2017</span>, revision <a class="link_ref" href="https://github.com/TEIC/TEI/commit/bd8dda3">bd8dda3</a>. This page generated on 2017-05-12T12:30:09Z.</address></div></div></body></html>
back to top