Encoding the Liber glossarum
The purpose of this page is to document the XML encoding as recommended by the TEI consortium, implemented by the LIBGLOSS Project. Our goal has been to produce a light but robust encoding degree, yet with wide operating possibilities.
In the following, we will describe the different levels of structure, as well as the descriptive markup that has been applied to the text.
1. Macro structure
- 1.1. The letters
- The text of Lib. gl. consists of 23 files corresponding to the letters of the Latin alphabet: A B C D E F G H I K L M N O P Q R S T V X Y Z. They are operated by means of an eXist database. The database is hosted by the Huma-num platform.
The structure of each letter has three levels that visualize the alphabetization of the Lib. gl. Already highlighted by the Lindsay edition, it is composed of Sections, Alines and Entries. The combination of the structural elements makes it possible to compose the reference system of the Lib. gl.: two letters identifying the letter and the section (or three letters in the particular case of the letter Q, always denoted QV + A, E, I, O, V), followed by entries number. For example :
|Letter||+ section||+ entry||= Reference|
- 1.2. Sections
- The division of the letters into sections (eg A, AB, AC, AD etc.) reflects Lib. gl. alphabetical level on the first two letters of each word. The Lindsay edition had added these two letters in the text as headings for each section. For our part, we chose not to add anything to the text as it was transmitted by the manuscripts, except for the numbering of the entries, which relies precisely on this first level of structure used to build unique identifiers (see 1.4).
- 1.3. Alinéas
- The sections are then cut into sub-sections recording a larger alphabetization on one or two additional letters (sometimes more). On the manuscripts, this alphabetical level was noted by means of large initials. It was viewed (unevenly and silently) in the Lindsay edition by means of line breaks. This structure level has no role in the composition of the references, but can be explored (and refined) using the "browse sections" tool of the "Read" menu. (Under construction.)
- 1.4. Entrées
The entries themselves form the last structuring element. Their numbering from 1 to x is carried out continuously within the sections (alinea does not affect it). It resumes from 1 to each new section.
The structure of the encoding reflects these main divisions thanks to the
divelements with a special type attributes as well as a number.
2. Micro structure
The Liber glossarum is not a dictionary in the strict sense, therefore it was not possible to apply the rigid markup structure developed for this kind of document (see
, hereinafter abbreviated TeiG).
We have chosen the more flexible solution offered by the
entryFree tag, whose content, as the name suggests, is free.
- 2.1. Content of entries
The entries of Lib. gl., Glosses, are composed of three main elements, sometimes supplemented by a fourth optional information. The gloss itself is composed of two components: the term (or lemma) and its explanation (the gloss proper). This couple is completed by the indication of the origin of the explanation (the source), and incidentally by critical indications. To avoid confusion, we speak of entries whose entire content is embraced by the
entryFreetag which has a unique attribute (
@xml:id) whose value is equivalent to the reference number of the entry (see 1.1 above, and 2.2.1 infra). It contains several "textual elements" that fall into three categories that we distinguish in terms of 'information levels'.
- Level 1: The text itself, mainly the term, its explanation and the two complementary elements.
- Level 2: The information added for level 1 (child elements).
- Level 3: All other additional information: the critical apparatus relating to level 1; Lindsay edition footnotes; text of the sources; research notes; bibliography.
The following table summarizes the XML elements involved in the first two levels
Level 1 Level 2 (ID number) 2.1
Term (lemma) 1.1
cit | ref | quote | seg | foreign[
name | persName, etc.]
bibl type="fons" | author @type="vet"[
Critical apparatus 1.4
- 2.2. Les Levels
First level: text.
It contains four markup elements: the main two components, the term (or lemma) and its explanation, as well as two additional information carried by the manuscripts, the source and critical notes (see Cinato, 2016).
Il comporte 4 éléments de balisage : les deux composants principaux, le terme (ou lemme) et son explication, ainsi que deux informations complémentaires portées par les manuscrits, la source et des indications critiques (voir ).
formelement of the dictionaries module [see TeiG ] allows, thanks to the attributes possibilities, to define with precision the nature of the term. As a first step, that is to say to the creation of the encoding, we decided to minimize the use of such attributes. In the future, developments in this element will help to inform the grammatical type or other relevant linguistic information (eg by means of att.lexicographic), thus enabling the creation of targeted indexes or limited research choices according to Grammatical category criteria, etc. The only attribute we solicited at the design stage is
@xml: lang, to distinguish terms foreign to Latin, that is, essentially Greek or Hebrew words.
defelement, which also belongs to the dictionaries module [s. TeiG ], is specifically dedicated to the content of a definition. We did not use any attributes at the creation stage, deferring their information to a period of future evolution.
authorelement [s. TeiG ] contains the indication of the source. It provides information on the origin of the explanation and is therefore considered as bibliographic information (
bibl) [s. TeiG ]. This markup contains two children elements
ref, but only the content of the first belongs to the text of the Lib. gl. ; The second constitutes a critical addition of which we will speak at the second level of information. First, we extended the
author's content to the works, which can later be marked with the
titleelement [see TeiG ]. The
@typeattributes make it possible to distinguish the bibliographic references given by the Lib. gl. with those added in addition by the editors.
note[s. TeiG ] contains the critical information (apparatus) associated with some entries. As with bibliographic elements, a
@typewill distinguish them from the notes and remarks added by the editors.
Second level: information related to Level 1 .
It includes four groups of complementary information.
numelement [s. TeiG ] contains the reference number in the Lib. gl. This alpha-numeric code serves as a unique identifier and is the value of the
@xml:idattribute of the entryFree element.
orthelement [s. TeiG ] gives the standardized form of the term. The choice of adding a standardized form is doubly justified because we have preserved the spelling of the manuscripts and because it was necessary to allow a search engine to find terms sometimes corrupted.
2.3. Incidentally, the
defelement may contain child elements to specify parts of the explanation. This is the case of quotations (
cit[s. TeiG ],
ref[s. TeiG ],
quote[s. TeiG ]), of isolated letters, words illustrating explanations (
seg[s. TeiG ] with various attributes), of foreign words to Latin (
foreign[s. TeiG ]) and which will include, in the future, the marking of proper names (name, persName etc.).
2.4. Another use of the
refelement with an
@typeattribute is to supplement the bibliographic reference given by Lib. gl. indicating in a standardized way which source is involved, since often the information delivered by Lib. gl. is limited to an author's name (see Grondeux, 2015).
Third level : Further information
This last level includes all the critical information associated with the elements of the previous levels. The majority of these are included in the 23 XML files, but some have been encoded in separate files (see the table below, preceded by an asterisk).
The critical apparatus has been encoded using the
appelement [s. TeiG ] which contains as many
@typeas textual elements, ie. 4 types. The
appelement has, as it should, the essential child elements used to build the apparatus:
lem[s. TeiG ] ;
rgd[s. TeiG ] ;
wit[s. TeiG ].
The following table summarizes the elements involved:
Level 3 Term (lemma) 3.1
Critical apparatus 3.4
Text of the source 3.5
Research notes 3.6
Foliotation of mss. 3.7
* Lindsay's notes 3.8
* Bibliographical references suppl. (**)
App @type="gen"element relates to the content of the
formelement and contains information about the entire entry (for example, omissions of whole entries in a witness).
3.2. The element
App @type="def"relates to the contents of the
defelement. Due to the method used to link the apps to the text, by single insertion point (according to a variant of the Location-referenced Method) and to display them as bubbles, these notes alone have been given an attribute
@locto attach them to the
anchorelements located in the text.
App @type="aut"element relates to the content of the
App @type="not"element relates to the content of the
reg[c. TeiG ] has a broader meaning here, since standardized (or "regularized") reading covers the entire definition. It is a way of giving to read the text of the source related to that of the Lib. gl.
note @type="obs"element (obs for observation) behaves like a 'free' field, in the sense that it may contain all sorts of relevant observations, relating to the manuscripts, the explanation itself or its sources.
locuselement, which specifies the value of an
@nattribute according to the manuscripts, is used to locate entries on the three main manuscripts.
@type="ed", contains the critical notes of Lindsay's edition. For consistency, they have been encoded in a particular file and also contain a set of elements (
seg | bibl | ref).
- First level: text. It contains four markup elements: the main two components, the term (or lemma) and its explanation, as well as two additional information carried by the manuscripts, the source and critical notes (see Cinato, 2016).
The (simplified) encoding scheme is as follows: