Encoding the Liber glossarum

The purpose of this page is to document the XML encoding as recommended by the TEI consortium, implemented by the LIBGLOSS Project. Our goal has been to produce a light but robust encoding degree, yet with wide operating possibilities.
In the following, we will describe the different levels of structure, as well as the descriptive markup that has been applied to the text.

1. Macro structure

1.1. The letters: The text of Lib. gl. consists of 23 files corresponding to the letters of the Latin alphabet: A B C D E F G H I K L M N O P Q R S T V X Y Z. They are operated by means of an eXist database. The database is hosted by the Huma-num platform.
The structure of each letter has three levels that visualize the alphabetization of the Lib. gl. Already highlighted by the Lindsay edition, it is composed of Sections, Alines and Entries. The combination of the structural elements makes it possible to compose the reference system of the Lib. gl.: two letters identifying the letter and the section (or three letters in the particular case of the letter Q, always denoted QV + A, E, I, O, V), followed by entries number. For example :

Letter	+ section	+ entry	= Reference
A	-	1	A1
A	B	1	AB1
A	B	2	AB2, etc.

1.2. Sections: The division of the letters into sections (eg A, AB, AC, AD etc.) reflects Lib. gl. alphabetical level on the first two letters of each word. The Lindsay edition had added these two letters in the text as headings for each section. For our part, we chose not to add anything to the text as it was transmitted by the manuscripts, except for the numbering of the entries, which relies precisely on this first level of structure used to build unique identifiers (see 1.4).
1.3. Alinéas: The sections are then cut into sub-sections recording a larger alphabetization on one or two additional letters (sometimes more). On the manuscripts, this alphabetical level was noted by means of large initials. It was viewed (unevenly and silently) in the Lindsay edition by means of line breaks. This structure level has no role in the composition of the references, but can be explored (and refined) using the "browse sections" tool of the "Read" menu. (Under construction.)
1.4. Entrées: The entries themselves form the last structuring element. Their numbering from 1 to x is carried out continuously within the sections (alinea does not affect it). It resumes from 1 to each new section.
The structure of the encoding reflects these main divisions thanks to the div elements with a special type attributes as well as a number.

	Level	Elements
1.1	Letters	`div1 type="littera" n=""`
1.2	Sections	`div2 type="pars" n=""`
1.3	Alineas	`div3 type="alinea" n=""`
1.4	Entries	`entryFree xml:id=""`

2. Micro structure

The Liber glossarum is not a dictionary in the strict sense, therefore it was not possible to apply the rigid markup structure developed for this kind of document (see TEI Guidelines , hereinafter abbreviated TeiG).
We have chosen the more flexible solution offered by the entryFree tag, whose content, as the name suggests, is free.

2.1. Content of entries

The entries of Lib. gl., Glosses, are composed of three main elements, sometimes supplemented by a fourth optional information. The gloss itself is composed of two components: the term (or lemma) and its explanation (the gloss proper). This couple is completed by the indication of the origin of the explanation (the source), and incidentally by critical indications. To avoid confusion, we speak of entries whose entire content is embraced by the entryFree tag which has a unique attribute (@xml:id) whose value is equivalent to the reference number of the entry (see 1.1 above, and 2.2.1 infra). It contains several "textual elements" that fall into three categories that we distinguish in terms of 'information levels'.

Level 1: The text itself, mainly the term, its explanation and the two complementary elements.
Level 2: The information added for level 1 (child elements).
Level 3: All other additional information: the critical apparatus relating to level 1; Lindsay edition footnotes; text of the sources; research notes; bibliography.

The following table summarizes the XML elements involved in the first two levels

		Level 1		Level 2
(ID number)			2.1	`num`
Term (lemma)	1.1	`form`	2.2	`orth`
Explanation	1.2	`def`	2.3	`cit \| ref \| quote \| seg \| foreign` [`name \| persName`, etc.]
Source	1.3	`bibl type="fons" \| author @type="vet"` [`title @type="vet"`]	2.4	`ref @type="ed"`
Critical apparatus	1.4	`note @type="ms"`

2.2. Les Levels

First level: text. It contains four markup elements: the main two components, the term (or lemma) and its explanation, as well as two additional information carried by the manuscripts, the source and critical notes (see Cinato, 2016).
Il comporte 4 éléments de balisage : les deux composants principaux, le terme (ou lemme) et son explication, ainsi que deux informations complémentaires portées par les manuscrits, la source et des indications critiques (voir ).
1.1. The form element of the dictionaries module [see TeiG ] allows, thanks to the attributes possibilities, to define with precision the nature of the term. As a first step, that is to say to the creation of the encoding, we decided to minimize the use of such attributes. In the future, developments in this element will help to inform the grammatical type or other relevant linguistic information (eg by means of att.lexicographic), thus enabling the creation of targeted indexes or limited research choices according to Grammatical category criteria, etc. The only attribute we solicited at the design stage is @xml: lang, to distinguish terms foreign to Latin, that is, essentially Greek or Hebrew words.

1.2. The def element, which also belongs to the dictionaries module [s. TeiG ], is specifically dedicated to the content of a definition. We did not use any attributes at the creation stage, deferring their information to a period of future evolution.

1.3. The author element [s. TeiG ] contains the indication of the source. It provides information on the origin of the explanation and is therefore considered as bibliographic information (bibl) [s. TeiG ]. This markup contains two children elements author and ref, but only the content of the first belongs to the text of the Lib. gl. ; The second constitutes a critical addition of which we will speak at the second level of information. First, we extended the author's content to the works, which can later be marked with the title element [see TeiG ]. The @type attributes make it possible to distinguish the bibliographic references given by the Lib. gl. with those added in addition by the editors.

1.4. Finally, note [s. TeiG ] contains the critical information (apparatus) associated with some entries. As with bibliographic elements, a @type will distinguish them from the notes and remarks added by the editors.
Second level: information related to Level 1 .
It includes four groups of complementary information.
2.1. The num element [s. TeiG ] contains the reference number in the Lib. gl. This alpha-numeric code serves as a unique identifier and is the value of the @xml:id attribute of the entryFree element.

2.2. The orth element [s. TeiG ] gives the standardized form of the term. The choice of adding a standardized form is doubly justified because we have preserved the spelling of the manuscripts and because it was necessary to allow a search engine to find terms sometimes corrupted.

2.3. Incidentally, the def element may contain child elements to specify parts of the explanation. This is the case of quotations (cit [s. TeiG ], ref [s. TeiG ], quote [s. TeiG ]), of isolated letters, words illustrating explanations (seg [s. TeiG ] with various attributes), of foreign words to Latin (foreign [s. TeiG ]) and which will include, in the future, the marking of proper names (name, persName etc.).

2.4. Another use of the ref element with an @type attribute is to supplement the bibliographic reference given by Lib. gl. indicating in a standardized way which source is involved, since often the information delivered by Lib. gl. is limited to an author's name (see Grondeux, 2015).

Third level : Further information
This last level includes all the critical information associated with the elements of the previous levels. The majority of these are included in the 23 XML files, but some have been encoded in separate files (see the table below, preceded by an asterisk).
The critical apparatus has been encoded using the app element [s. TeiG ] which contains as many @type as textual elements, ie. 4 types. The app element has, as it should, the essential child elements used to build the apparatus: lem [s. TeiG ] ; rgd [s. TeiG ] ; wit [s. TeiG ].
The following table summarizes the elements involved:

		Level 3
Term (lemma)	3.1	`app @type="gen"`
Explanation	3.2	`app @type="def"`
Source	3.3	`app @type="aut"`
Critical apparatus	3.4	`app @type="not"`
Text of the source	3.5	`reg`
Research notes	3.6	`note @type="obs"`
Foliotation of mss.	3.7	`locus`
* Lindsay's notes	3.8	`note @type="ed"`
* Bibliographical references suppl.	(**)

(**) Under construction.

3.1. The App @type="gen" element relates to the content of the form element and contains information about the entire entry (for example, omissions of whole entries in a witness).

3.2. The element App @type="def" relates to the contents of the def element. Due to the method used to link the apps to the text, by single insertion point (according to a variant of the Location-referenced Method) and to display them as bubbles, these notes alone have been given an attribute @loc to attach them to the @xml:id of the anchor elements located in the text.

3.3. The App @type="aut" element relates to the content of the author @type="vet" element.

3.4. The App @type="not" element relates to the content of the note @type="ms" element.

3.5. The reg [c. TeiG ] has a broader meaning here, since standardized (or "regularized") reading covers the entire definition. It is a way of giving to read the text of the source related to that of the Lib. gl.

3.6. The note @type="obs" element (obs for observation) behaves like a 'free' field, in the sense that it may contain all sorts of relevant observations, relating to the manuscripts, the explanation itself or its sources.

3.7. The locus element, which specifies the value of an @n attribute according to the manuscripts, is used to locate entries on the three main manuscripts.

3.8. Other note element, but @type="ed", contains the critical notes of Lindsay's edition. For consistency, they have been encoded in a particular file and also contain a set of elements (seg | bibl | ref).

3. Recapitulation

The (simplified) encoding scheme is as follows:
<TEI> <text> <body> <div1> <div2> <div3> <entryFree> <form> TERM (lemma) </form> <def> EXPLICATION (glosse) </def> <bibl> <author> SOURCE </author> <ref></ref> </bibl> <note type="ms"> CRITICAL SIGNS</note> <app></app> <reg></reg> <note type="obs"></note> <locus></locus> </entryFree> </div3> </div2> </div1> </body> </text> </TEI>