This document serves to describe how the TEI standard was customized for the project Digital Edition of Fernando Pessoa. Projects and Publications. In the project, two main types of sources are transcribed: (1) documents that Pessoa authored, containing editorial lists, notes, and plans (referenced as documents in the following), and (2) poems and prose texts that he published during lifetime (publications). The first type of source is hand- or typewritten and may include changes such as later additions, substitutions, or deletions. The second type consists of material printed in journals.
The encoding of the metadata in the TEI header and of information about images in the facsimile section are outlined for both types of sources together, differentiating between them where necessary. How the transcribed text is encoded in the TEI body is explained for each source type separately. Examples are given in the running text.
2. The TEI header
2.1. General information
2.1.1. Title, author and responsibilities
In the title statement of the TEI header, information about the title and author of a document or publication is gathered. In the case of the documents, the title corresponds to the identifier that the document has in the source collection that it was retrieved from, for example:
In both cases, the name of the author is further marked up with an <rs> (referencing string) element declaring that the author's name is a reference to a person name identified elsewhere. In the project, an external list of person names is kept where each name has an identifier. The main heteronyms of Pessoa have the identifiers "FP" (Fernando Pessoa), "AC" (Alberto Caeiro), "AdC" (Álvaro de Campos), "RR" (Ricardo Reis), and "BS" (Bernardo Soares), which are given as values of the attribute key in the references.
Furthermore, in the title statement, different responsibilities for the creation of the encoded file are listed, each in an element <respStmt>, containing an element <resp> where the kind of responsibility (or responsibilities) is described and an element <name> indicating the full name of the person responsible:
The element <publicationStmt> (publication statement) contains information about the published TEI file. In the following, an example is given:
<publicationStmt>
- <publisher>Universidade Nova de Lisboa, Instituto de Estudos de
- Literatura e Tradição (IELT)</publisher>
- <publisher>Cologne Center for eHumanities (CCeH)</publisher>
- <date>2017</date>
- <availability status="free">
- <licence target="http://creativecommons.org/licenses/by/4.0/"/>
- </availability>
- <idno type="filename">BNP_E3_144D2-111r.xml</idno>
-</publicationStmt>
It contains information about the publishing institutions (encoded in the element <publisher>) and about the publication date of the file (in an element <date>). Furthermore, a statement on the availability of the file is made and a licence information is given. The availability can be either "free", if the file is already published, or "restricted", if the work on the TEI file is still ongoing. All the TEI documents are published under a Creative Commons Attribution 4.0 Unported license (CC BY 4.0). Finally, the filename is given as an identifier in an <idno> element.
2.1.3. Notes statement
The element <notesStmt> (notes statement) serves as an editorial note, encompassing annotations that provide additional information beyond the details given in the sections of the source description. It consists of two parts, with at least the second part always being present:
<notesStmt>
- <note type="summary"> Poema publicado em <hi rend="italic">A Revista
- da Solução Editora</hi> , 1929 e <hi rend="italic">Cancioneiro do 1º Salão dos Independentes</hi> , 1930.
- Apresentamos aqui as imagens de ambas as publicações, cujos
- textos são, em termos formais e de conteúdo, idênticos. </note>
- <note type="genre">
- <rs type="genre" key="poesia">Poesia</rs>
- </note>
-</notesStmt>
The first part functions as a type of individual free-text comment and is provided within a element <note> with the attribute type, which always containing the value "summary".
The second part, on the other hand, serves as a section for genre classification. It is also provided inside a element <note> with the attribute type, but this time always with the value "genre". Furthermore, each <note> element contains an <rs> element with the attribute type, also always holding the value genre, and the attribute key. Depending on the content type, one of two options, namely the values "poesia" and "prosa", is specified within the attribute key.
The element <notesStmt> is specific to the publications and does not apply to the documents. Regarding the documents, the genre assignment can be found in the second part of the content description. For that, see the section on contents below.
2.1.4. Description of the source
The sources of the documents and publications are documented in the source description. Because the documents are archival sources and the publications published bibliographic items, the source description for these two types of resources is made in a different way, as outlined in the following subsections.
2.1.4.1. Sources of documents
The sources of the documents are encoded inside of an element <msDesc> (manuscript description), which itself is a child element of the element <sourceDesc>:
The manuscript description has three parts. The first part servers to identify the source, the second part to describe its contents, and the third part to encode details on the history of the source.
2.1.4.1.1. Identification
The identification is done inside of the element <msIdentifier>, which is the first child element of <msDesc>:
<msIdentifier>
- <institution>Biblioteca Nacional de
- Portugal</institution>
- <idno>BNP/E3
- 144D2-111r</idno>
-</msIdentifier>
Inside of that element, the institution holding the source is indicated in an element <institution>. Furthermore, the identifier that the source has in the source institution as well as in this project is listed in an element <idno>.
2.1.4.1.2. Contents
The contents of the source are described in an element <msContents>, which follows after the element <msIdentifier>, as in the following example:
The description of the contents has three parts. First, a general note on the source document is given in an element <summary>. The summary also indicates if and where a document has been published before, outside of this edition project. The second part of the content description is given in an element <msItemStruct> (structured manuscript item). It contains information about the author of the document (which is always Fernando Pessoa). Furthermore, it contains a note on the genre or genres of the document. This note is encoded with the element <note>, which has the attribute type and the value genre. Each genre inside of the note is marked with an element <rs> of the typegenre. The attribute key indicates the identifier of the genre. Inside of the <rs> element, the name of the genre is given in text form and in Portuguese language. The following three genres of documents occur: "Lista editorial" (lista_editorial), "Plano editorial" (plano_editorial), and "Nota editorial" (nota_editorial). Finally, also the language or languages of the text on the document are indicated, in the element <textLang>. The main language of a document is given in the attribute mainLang on that element. The value of that attribute is a shortcut for a language, in this case "pt" for Portuguese.
2.1.4.1.3. History
The third part, which contains information about the history of the source, is described in an element <history>. Inside <history>, there is the element <origDate> (origin date), which is enclosed by the elements <origin> and <p>:
The element <origDate> contains various forms to identify the origin date of a source document. In general, a distinction is made between certain and uncertain data. The indication of uncertainty is done through the use of the attribute cert, which, when used, always has the value medium:
As already indicated in the two examples above, the representation of the temporal data itself may vary. The following variations are possible:
(1) Only providing a year.
(2) Providing a year along with a month.
(3) Providing a year along with month and day.
(4) Not providing a date, indicated by a question mark.
To indicate the first three variations of possible origin dates, there are also different options expressed through different attributes. The attribute when, for example, specifies a particular date:
The attribute notAfter indicates that the occurrence happened only before a certain point in time:
<p>
- <origDate notAfter="1922-12">ant. Dezembro de
- 1922</origDate>
-</p>
Only when specifying a missing date, indicated by a question mark, none of the mentioned attributes are used.
2.1.4.2. Sources of publications
The sources of the publications are encoded only within an element <sourceDesc>:
<sourceDesc> [...] </sourceDesc>
The description has two parts. The first part serves to identify the respective work(s) within the index of works, while the second part is used to describe the bibliographic information of the work(s).
2.1.4.2.1. Work index
Within the element <sourceDesc>, there is a element <list> with the attribute type, indicates by the value work-index that it functions as a list for indexing works.
As seen in the example above, the element <list> can contain multiple <item> elements, but there will always be at least one <item> element representing a single entry in the list. Each <item> element contains an <rs> element with the attribute type (in which the value work can always be found) and the attribute key. The attribute key holds a unique key-value that consistently follows a specific pattern, as demonstrated in the examples above and below: It begins with the letter "W" followed by a consecutive number.
The unique key-value is determined not only by the title of a work but also by the authorship. In the project, the allocation of work identifiers takes places is an external central work register.
2.1.4.2.2. Bibliographic description
The bibliographic information of the publications are described in an element <biblStruct>, in which only bibliographic sub-elements in a specific order appear, according to the general TEI guidelines:
This part contains the bibliographic title of the item, such as an article or poem, that is published within a monograph or journal rather than as an independent publication. It also includes the attribute key to indicate the author, which usually refers to Pessoa himself (with the identifier FP) or, in some cases, to one of his heteronyms (AC, AcD, RR, or BS). Furthermore, in the analytic part, the element <textLang> provides a value to indicate the primary language of the bibliographic work in the attribute mainLang, as well as one or more values to identify any other languages used in the published work in the otherLangs.
The element <monogr> (monographic level) provides the second part of the bibliographic description. It contains the bibliographic information about the item (e.g., a monograph or a journal) that was published as an independent object (i.e., a stand-alone part) and which includes the work described in the <analytic> element:
The element <monogr> always includes the element <biblScope>, which defines the scope of the bibliographic work mentioned in the first part. The scope may encompass various details, such as page numbers or a named subdivision within a larger work.
In some cases, a work has been published more than once, which necessitates the use of multiple <biblStruct> elements:
In order to formally address these multiple bibliographic information, they require an identifier in the XML. Therefore, each carries an attribute xml:id that has a unique value. These identifiers consistently adhere to a specific pattern: they begin with the abbreviated title of the journal or monograph, followed by an underscore, and then the year of publication. Accordingly, for the above example of the work entitled "Mar Português", published once in 1922 in the journal "Contemporânea" and again four years later in the journal "Leitura para todos - Revista mensal ilustrada", the following two IDs result: "Contemporânea_1922" and "Leitura_1926".
The purpose of assigning the identifiers is to enable a reference in the text to the specific place of publication as well as to the respective facsimiles from the various issues. In this way, variations within a text can be clearly assigned to a specific source and formally linked to it.
2.1.5. Encoding description
The relationship between the transcribed text and its source, the facsimiles from which it is derived, is documented using the element <encodingDesc> (encoding description), which only appears in the publications and not in the documents themselves.
Within the element <encodingDesc>, there's always the element <variantEncoding> (variant encoding), which contains the attributes method and location:
According to the general TEI guidelines, the attribute method indicates which method is used to encode the variants' apparatus. It always contains the value "parallel-segmentation", expressing that alternate readings of a passage are presented side by side in the text, without the need for a base text. In contrast, the attribute location indicates whether the apparatus appears within the text or outside it. It consistently holds the value "internal" signifying that the apparatus appears within the text.
3. Facsimiles
The <facsimile> element is used to hold information about the image files representing a facsimile of the text transcribed in the TEI body. For the Pessoa edition, these image files are stored on the image server of the Cologne Center for eHumanities (CCeH). The link to this image server is specified in the xml:base attribute within the <facsimile> element.
Within all TEI files, the <facsimile> element contains one or more <graphic> elements, each indicating the path to individual image files in a url. These paths are relative to the base URI for the images. For example:
As mentioned in the sections of the bibliographic description, there are instances where a work has been published more than once, which necessitates the use of multiple <facsimile> elements:
As shown in the example above, there's an additional attribute corresp within the <facsimile> element, alongside the existing attribute xml:base. The value of the corresp attribute points to the identifier of a bibliographic source defined in the source description in the TEI header, so that a specific source is linked to a specific set of facsimiles. This is relevant in cases in which there are several different sources of a published work and several corresponding sets of facsimiles. To refer to these identifiers, a preceding hash sign (#) is used.
4. Transcriptions
4.1. Encoding of documents
4.1.1. General structure
The transcription of documents is encoded inside of the TEI <text> and <body> elements. Each text body must at least contain one division, encoded with a <div> element:
The example is taken from the documentBNP 5-83r which begins with the heading "Na Casa de saude...". This heading introduces the first division of the document and is therefore included inside a <div>. The facsimile of the whole document is given in the following:
Here it becomes visible that the document has three parts, each starting with an own heading. Each of these parts is encoded in an own division:
Also enumerations of titles can have the form of paragraphs (instead of lists), if the items are written down one after the other without structuring them as a list. An example is shown in the following facsimile of the list BNP 125A-52r:
The second part of the document, entitled "Manual do Sebastianista", contains three titles that are mentioned directly after each other and just separated by an hyphen. This is encoded as follows, wrapping a <p> element around the titles:
<head>
- <hi rend="underline">
- <rs type="title">Manual do
- Sebastianista</rs>
- </hi>
-</head>
-<p>
- <rs type="title">Historia do sebastianismo</rs> —<lb/>
- <rs type="title">Prophecias <del rend="overstrike">sebastianistas</del>
- <lb/>e sua interpretação</rs> — <rs type="title">A re<pc>-</pc>
- <lb/>nascença do
- sebastianismo</rs>.
-</p>
4.1.2.3. Lists
A list is a set of ordered items that may be numbered or not. Lists are encoded with the element <list>, which contains one or several child elements <item>, as in the following example:
If the items are numbered or otherwise marked (for example by initial dashes), these marks are encoded with the element <label> at the beginning of each list item. The text of the list item follows directly after the label.
4.1.2.3.1. Lists inside of lists
Sometimes, an item of a list contains itself another list. An example is shown in the following facsimile of the list BNP 128-11r:
The last part contains a list with two items: "Parlour Games" and "Technical Dictionaries". The "Technical Dictionaries" item contains itself a sublist of two items: "A. Commercial" and "B.". This is encoded by using a list inside of a list:
After the text of the item "Technical Dictionaries", but still inside of that item, another <list> element opens. It has the attribute rend with the value inline to mark that this sublist should not begin on a new line but be placed after the text of the item containing it. The items of the sublist are encoded inside of this second <list> element in the usual way.
Another example of a sublist can be seen on the following page of the list BNP 133M-96 a 98:
In the third item "Gamage...", there is a sublist which does not start on the same line as the item text, but on the next line. Still, the list is indented, so that it becomes clear that it is a list inside the bigger list. This is encoded as:
The only difference to the previous example is that the attribute rend has the value indent instead of inline.
4.1.2.4. Tables
On the some of the documents, the structure of the text is more similar to a table than to a list or could be interpreted as both a list or a table. The following facsimile of the list BNP 144Q-34r shows such a case:
The first part of the document contains a list to which a column of numbers is attached to the right. To be able to align this last column with the list entries, the whole list is encoded as a table:
Tables are encoded with the element <table>. They contain first rows, encoded with the element <row>, and then for each row the column values, encoded with the element <cell>. Here, the table has a row for each list entry and three columns. The first column holds the labels of the entries (1., 2., 3., ...). The second column contains the text of the entires ("Introduction (brief)", "The Anarchist Banker", etc.), and the third column contains the number of pages. For the items for which no page numbers are given, the third <cell> is left empty, but it still needs to be there, so that the structure of the table is correct. In columns with numbers, the text is usually aligned to the right. This is indicated with the attribute rend on the respective <cell> and the value right. The first row of the column is special, because it only contains a heading for the third column: "approx. no. of pages". Here, only one <cell> element is given for the row which has the attribute <cols> with the value 3. This means that in that row, one column spans over the width of all the three table columns. In addition, the text of this cell is aligned to the right, using the attribute rend with the value right on <cell>, so that the text "approx. no. of pages" appears to the right. Also the lower part of the document contains a list that has a column with page numbers attached to it and is therefore interpreted as having a tabular structure.
4.1.2.5. Notes
Notes can occur everywhere in a document. Usually, a note is contained inside of a list or at the margin of it. Notes may be interpreted as part of the original version of a list or as having been added later. In the latter case, they are encoded genetically (for an example of the genetic encoding of a note see Notes added on the margin). An example of simple notes added to the margin is shown in the following facsimile:
Here Pessoa placed question marks to the left of some of the list items. This is encoded as follows:
Here, the second item has a note on the left margin. When the note is on the left margin, the element <note> is added at the beginning of the list item, inside of the element <item>. The <note> gets the attribute place, in this case with the value margin-left. For place, also the values margin-right (then the <note> element would be added at the end of the list item), top, below, and center are possible. The text of the note is simply added inside of the <note> element.
4.1.2.6. Line breaks
Line breaks of text are encoded using the empty element <lb> as in the following example of a list item which continues on a second line:
The element <lb> is only used if the line break is not due to the structure, meaning that new divisions, paragraphs or list items are not especially marked with <lb>, only line breaks in running text are marked with it.
4.1.2.7. Punctuation characters
Sometimes it is necessary to encode punctuation characters, for example hyphens used to divide words at the end of lines, so that these can be displayed or not when the document is rendered, depending on whether line breaks are included in the visualization of the document or not. Punctuation characters are encoded with the element <pc>, as in the following example:
Here there is an hyphen between "Disserta" and "ções" which is marked up with the element <pc>. The line break following the division is encoded after the punctuation characters, using the empty element <lb>.
4.1.3. (Typo)graphical renditions
An important part of the encoding of the documents in the project is how certain aspects of them were rendered in the sources. In general, indications about how something looked like (how it was (typo)graphically emphasized or organized) are made in the attribute rend, which can be used on many different elements.
4.1.3.1. Alignment of text
By default, the text of different elements is shown on the left side of the page. It the text should instead be centered or appear on the right side, this can be indicated with the attribute rend and the values center or right. An example of a heading which is centered is given in the following:
<head rend="center">Q.</head>
4.1.3.2. Highlighted characters, words, or passages
4.1.3.2.1. Underlinings
Often, Pessoa highlighted text by underlining it. This is encoded using the element <hi> in combination with rend. In the following example, the whole heading is underlined. It does therefore contain a child element <hi> with the attribute rend, having the value underline:
<head>
- <hi rend="underline">
- <rs type="collection" key="C16">
- <rs type="title" key="T184">Na Casa de saude de
- Cascaes</rs>
- </rs>
- </hi>
-</head> [...]
Also individual characters or words can be underlined. Then the <hi> element is just wrapped around these parts.
4.1.3.2.2. Superscripts
If text (or individual letters) are added as a superscript, meaning that they are attached as small letters to the top of a preceding word, this is encoded as follows:
In the example, the list item contains the name "Marquez de Pombal", which is abbreviated using just the letter "M" with a small superscript "z" for "Marquez". The superscript is encoded with the element <hi> and the attribute rend with the value superscript.
4.1.3.2.3. Frames (square boxes)
Sometimes, Pessoa highlights parts of a list by drawing a frame around it. An example can be seen on the list 133M-30r:
Here, a box is drawn around the text "(Advertise for Cipher Agency - America)". This is encoded as follows (this is at the same time an example of a modification, see Other modifications):
The element <mod> surrounds the text to be framed. The attribute rend indicates how it should be modified, here by adding a frame (framed). The attribute n with the value 2 indicates that the modification was only done later and should be part of the second edited version of the document.
If the frame was not added later, but would have been part of the original list, instead of <mod> the element <hi> could be used to say that the passage is highlighted by framing it:
In the example, a note on the right margin is circled, which is just indicated by adding the attribute rend to the note and give it the value circled.
Also, the element <hi> can be used, as in the following example:
<p rend="center">
- <anchor xml:id="A4"/>
- <hi rend="circled">or</hi> Being
- an apology <subst>
- <add n="2" place="above">for</add>
- <del n="1">of</del>
- </subst> all culture not genuine.
-
-</p>
Here, there is a part inside of a paragraph that is circled. Because this part has no element by itself to attach the attribute rend to, the element <hi> is used to mark that the text is highlighted. Again the value of rend is circled.
4.1.3.3. Indentations
It may be the case that the text does not start directly at the beginning of a line but is indented. In the following example, there is a list which is not starting in an own line, but after the text "inclue: –". The list does therefore carry the attribute rend with the value inline.
Also, in this example, the text of the list items is indented from the second line on (meaning that the first line of each item is not indented, but every other line following it, is). This is indicated b using the rend with indent-2. To get an impression of how this looks like, see the facsimile of this document:
4.1.3.4. Division lines
Often, Pessoa draws lines on his documents to mark divisions between different parts of his notes. Such division lines are encoded using the element <metamark>, as in the following example, where a line is drawn between a list and the next heading:
"Metamark" means that this mark servers as a guide to the structure of the document, to how it should be read (for example, in which order). The attribute rend is used here to indicate the style of the mark, in this case a line. Also the function of the mark is encoded in the attribute function. In this case the function is to indicate that a new, different section of the document begins, so the value of the attribute is distinct.
4.1.3.5. Division space
Instead of division lines, sometimes there is just additional space on the documents to mark the difference between one list and the next, or between different items of a list. In the following facsimile of the list BNP 12-1 10r, there is a list about Antonio Móra consisting of three parts. The first part has three list items, then two other items follow separated by space from the first part of the list:
To encode such spaces, the element <metamark> is used with the attribute rend having the value space. In the above example, the function of the space is to signal that the items are distinguished from each other, so the metamark gets the additional attribute function with the value distinct:
<item>
- <rs type="title">Dissertação sobre a arte
- moderna</rs>.
-</item>
-<metamark rend="space" function="distinct"/>
-<item rend="indent-2">
- <rs type="title">Prolegómenos a uma
- reformação<lb/>do paganismo</rs>.
-</item>
Such metamarks may be added between lists, or between list items. In the above example, the mark is added inside of the list between the individual items.
4.1.3.6. Lines as placeholders
In the documents, sometimes Pessoa uses lines as placeholders for some text that he maybe wished to add later. In the following facsimile of the document BNP 87 68r, the third list item of he second list on the page begins with a line, followed by the text "(some new collaborator)".
Here, the line clearly stands for some name to be added later. This is encoded as follows:
The line is marked up with the element <metamark> and the attribute rend with the value line. It has another attribute function with the value placeholder. In this example, the editor decided to explain that some text was omitted here, so the placeholder line is interpreted as an abbreviation standing for some other text. It is therefore enclosed in a construction of <choice> with the child elements <abbr> (containing the line mark) and <expan> (containing the supposed expansion). But, as the text that the placeholder stands for is not known, <expan> contains an element <supplied> with reasonomitted-in-original.
Lines can also serve as placeholders for text that was already mentioned before. In the following example, a line is used to signal that text from the preceding list item is repeated:
Here, the line has the function of "ditto". Is is encoded with the element <metamark>, carrying the attribute rend with the value line and the attribute function with the value ditto. In this context, the line serves as an abbreviation, which is expanded to the text that it represents. For more details about this example, see the section on abbreviations below.
4.1.3.7. Space as placeholder
Like lines, also space can serve as a placeholder either for some text that Pessoa wished to add later, or in the function of "ditto", repeating some text that was given earlier.
An example of the first case is visible in the facsimile of the list BNP 87 40r:
Here, the first three list items were entirely left blank and the fourth was left blank in the beginning. Such items are encoded as follows:
The space is marked up with the element <metamark> and the attribute rend with the value space, as well as the attribute function with the value placeholder. It is interpreted as an abbreviation for something else, so it is surrounded by an element <abbr>. This is expanded inside of an element <expan>, which containts an element <supplied>, saying with the attribute reason and the value omitted-in-original, that the editor thinks that some text is missing here. The responsibility of this interpretation is given in the attribute resp which takes the initials of the editor as value. Finally, both <abbr> and <expan> are contained inside of an element <choice>, indicating that these two encodings are alternative views on the document, a more documentary one marking the space and a more interpretive one saying that something was omitted.
An example of the second case, space serving as "ditto", can be seen in the facsimile of the list BNP 48-56r:
Here the names initiating list items are only given the first time, e. g. "Robert Browning : Eveln Hope." From the second time on, there is just a space, which is thought to be filled with the same name. This is encoded as follows:
The element <choice> is used to mark that either the blank space can be shown or the name that it stands for. The blank space is interpreted as an abbreviation and marked up with the element <abbr>, inside of which <metamark> is used to mark the space itself. The <metamark> element here has the attribute rend with the value space and the attribute function with the value ditto. The expansion is then used to fill in the text that the space stands for, in this case the name "Robert Browning". This is encoded in the element <expan>.
4.1.3.8. Curly brackets
Curly brackets are often part of notes added to the margin by Pessoa. They are encoded using the element <metamark> with rend having the value curly-bracket.
For an explanation of a complete example of a margin note, see the section on Notes added on the margin.
4.1.3.9. Crosses
Pessoa uses crosses to mark that he is uncertain or has doubts about a passage of text on a document. An example is shown in the following facsimile from the list BNP 48, 18 and 19:
After the name "Alfredo Guisado" (the fith list item from the bottom) there is text in parentheses which is marked with a cross to the right: "(baloiço que me baloiça ?)+" This cross is interpreted as marking that Pessoa is not sure about the text preceding it in parentheses. This is marked up as follows:
To mark the uncertain passage, the element <seg> is used with the attribute type having the value certainty. The degree of certainty is indicated in the attribute cert and can be high, medium, or low. That Pessoa was the one having doubts is indicated with the attribute resp with the value FP. Here the cross is not included in the transcription anymore, because the uncertainty is indicated with the TEI element and the attribute rend is used to mark how it was rendered originally (here with the value cross right to say that the passage was marked with a cross on the right side; another possible value would be cross left).
4.1.3.10. Arrows
On some lists, Pessoa uses arrows to connect different passages of text, or to show that some text is moved somewhere else. An example can be seen in the following facsimile of the list BNP 136-57v:
The last part of this document contains an arrow pointing from the heading "The New Decadence" to "or Being an apology of all culture not genuine". This is encoded as follows:
<div>
- <head rend="center">
- <metamark rend="arrow-down"
- function="assignment" target="#A3"/>
- <hi rend="underline">
- <rs type="periodical" key="J55">The New
- Decadence</rs>
- </hi>
- </head>
- <p rend="center">An Introduction to the Study of
- <lb/>Indifference.</p>
-</div>
-<metamark rend="line center"
- function="distinct"/>
-<div>
- <p rend="center">
- <anchor xml:id="A3"/>
- <hi rend="circled">or</hi>
- Being an apology <subst>
- <add n="2" place="above">for</add>
- <del n="1">of</del>
- </subst> all culture
- not genuine. </p>
-</div>
The arrow itself is encoded as an element <metamark> with the attribute rend having the value arrow-down and the attribute function having the value assignment, because the arrow serves to assign the text to something else. Other renditions of arrows are possible: arrow-up, arrow-left, arrow-right, arrow-left-down, arrow-left-up, arrow-right-down, arrow-right-up, arrow-left-curved-down, arrow-left-curved-up, arrow-right-curved-down, arrow-right-curved-up, depending on in which direction the arrow points (up, down, left, right, or a combination of these) and whether it is straight or curved. The attribute target points to an anchor somewhere else which marks the point the arrow points to and the value of this attribute is the identifier of that anchor, preceded by "#", in this case #A3. In this example, the anchor is defined at the beginning of another paragraph and is added before the text of that paragraph begins. It is encoded with the element <anchor> and has the attribute xml:id to define the identifier A3.
There can also be text on arrows, as in the following example:
In the lower part of the list, there is a deleted item "(Antonio Mora)" that has two arrows at the end, one pointing to another item below it and the other pointing up. The arrow that points up has the text "F Pessoa" on it. This is encoded using an element <label> inside the <metamark> for the arrow, as in the following example:
Each of the arrows is encoded with an element <metamark>. One arrow has the attribute rend with the value arrow-right-curved-down, because it is an arrow that is curved and points downwards on the right side, and the other one has the value arrow-right-curved-up, as it is curved and points up on the right side. Both <metamark> elements have the attribute function with the value assignment because the arrows assign a list item to other places in the document. In the attributes target, the identifiers of the elements marking the goal of the arrows ("A1" and "A2") are given, preceded by the "#". These <anchor> elements are defined elsewhere, as in the previous example. Inside of the first <metamark> element, the text on the error is encoded in the element <label>. Here, the text is the name "F Pessoa", so the label contains a reference to a name and an abbreviation which is expanded.
4.1.3.11. Vertical text
On some handwritten documents, the text of notes on the side or entire lists is turned around and appears in vertical form. An example is shown in the following facsimile of the list BNP 133F-36v:
At the top left of the document, there is a list rotated to the left. In the lower part of the document, a side note is attached to the second and third item of a list, which is also written as vertical text, rotated to the left. In TEI, this is encoded as follows:
The list at the top of the document gets an attribute rend with the value rotate-left. The same attribute and attribute value are used for the margin note. There, the element <label>, which holds the text that the curly bracket points to, has the rend attribute with rotate-left.
The genetic encoding involves changes that Pessoa himself made to the documents, for example text that he added, changed or deleted later. Such changes are interpreted as belonging to a second temporal level. Just two levels are differentiated, a first version (level 1) and a final version (level 2).
4.1.4.1. Additions
In general, additions are encoded using the element <add>. The following facsimile shows an example of an addition:
In the second list item, a note is added below the word "Arist". This is encoded in the following way:
<item>— Voto
- – Democracia – <seg type="anchor">
- <choice>
- <abbr>Arist</abbr>
- <expan>Arist<ex>ocracia</ex>
- </expan>
- </choice> (critica) <add place="below" n="2">como se passa de uma idéa de aristocracia a
- outra</add>
- </seg>
-</item>
The element <add> is used to mark up the text that is added, in this case "como se passa de uma idéa de aristocracia a outra". This element carries an attribute place indicating where the addition is positioned in relationship to the existing text, in this case below it. The attribute place may have the values above, below, after, or margin-left. The second attribute used on <add> is n. It serves to mark the level of the genetic encoding, here the second level (2) because the text was added to the list later.
There is a third important element of the encoding of this addition. In the example, the element <seg> (segment) with the attribute type and the value anchor is used to create an anchor point for the addition. This means that the addition relates to the point in the text where "Arist (critica)" occurs. It is important that the <add> element occurs inside if the anchor <seg>. Such an anchor segment is only needed when the addition is not placed in relationship to the whole item that it occurs in (and that already has an own XML structure), but only refers to a part of it. In this case, the whole item is "— Voto – Democracia – Arist (critica)", but the addition is only made to the latter part, and "Arist (critica)" did not have any own mark-up before, so the segment is added here.
4.1.4.1.1. Notes added on the margin
A special case of addition are notes that Pessoa added on the margin of a document. Often, curly brackets are used to group items in a list and add a note to them. An example of this can be seen in the following facsimile taken from the document BNP 12-1 10r:
Here the first three items of the list have a note added on the right margin ("Tres Dissertações"). This note is encoded as follows:
<item xml:id="I1">
- <rs type="title">Dissertação sobre as
- revoluções</rs>.
-</item>
-<item xml:id="I2" rend="indent-2">
- <rs type="title">Dissertação a
- favôr da Allemanha<lb/>e do seu procedimento na
- guerra presente</rs>. <note target="range(I1,I3)"
- place="margin-right" type="addition" n="2">
- <metamark rend="curly-bracket"
- function="grouping">
- <label>Trez<lb/>Disserta<pc>-</pc>
- <lb/>ções.</label>
- </metamark>
- </note>
-</item>
-<item xml:id="I3">
- <rs type="title">Dissertação sobre a
- arte moderna</rs>.
-</item>
For the note itself, the element <note> is used. It has several attributes. First, the note is of the typeaddition. Second, it has the attribute place, indicating where the note was added, in this case on the right margin, so it has the value margin-right. Other possible values for place of <note> are margin-left, below, top, and center. Third, the note carries the attribute n with the value 2, indicating that the note is part of the last version of the document because it is interpreted as having been added later by Pessoa. Fourth, the note has an attribute target. This serves to explain to which items of the list the note is added. In this example, the note is added to the three first items in the list. To be able to address these items formally, they need an identifier in the XML. Therefore, the three list items each carry an attribute xml:id with unique values, here I1, I2 and I3. Then the target of the <note> can use these identifiers and point to them. The value of the <note>'s target is range(I1,I3), which means that the note points from the first item to the third item. The element <note> is here place inside of the second list item, after the text of the list item. The best way to place the note is in the middle of the range of items it points to. Because it points to item 1-3 here, item 2 is a good place to position the <note> element. The note itself is further encoded inside of the <note> element. Here, an element <metamark> is added to represent the curly bracket. It has two attributes: rend with the value curly-bracket and function with the value grouping, because the bracket serves to group the three list items. Finally, the curly bracket has a "label", which is the text of the note. This is encoded inside of an element <label>. The text is added here, and the line breaks occurring in the text are also marked with empty <lb> elements. Also, there is an hyphen dividing the word "Dissertações", which is encoded with the element <pc> (for punctuation character).
Another example of a note added to the margin is visible in the following list BNP 133M-30r:
At the top of the document, two words are written on the right side of the typed list. Both are struck through. This margin note is encoded as follows:
The note is interpreted as belonging to the first list item, so an element <note> is added at the end of this first list item. The <note> element carries the attribute place to say that the note is added on the right margin of the list (margin-right) and the attribute n with the value 2 to say that the note was added to the list later and is interpreted as belonging to the second edited version of this document, but not to the first one. The text of the note itself is transcribed and encoded inside of the <note> element, just that in this case, the words could not be read by the editor. They are therefore marked up as two <gap>s, each with reasonillegible, unitword, and extent1. The words are separated by a line break (<lb>) and are both deleted, which is marked-up with the element <del> and the attribute rend with the value overstrike.
4.1.4.1.2. Additions of longer passages of text
In some cases it is not just one or several characters or words that is added to a list, but more text, for example several new list items or a whole list. In those cases, the element <add> is impractical because it cannot contain structures such as several list items or a whole list, so another solution is needed for the mark-up of such additions. An example can be seen in the following list BNP 133M-30r:
On this document, a handwritten list ("1. System of Shorthand. 2. Look for door...") is added to a typed list which was present first. This is encoded as follows:
<addSpan n="2" spanTo="#A1"/>
-<list>
- <item>
- <label>1.</label> System of
- Shorthand. </item>
- <item>
- <label>2.</label> Look for door
- — in instead of out. </item>
-</list>
-<anchor xml:id="A1"/>
Before the list that is to be added, an element <addSpan> is used. This is an empty element (it has no opening and closing tag, but just one tag which closes directly with />). It is just to mark the beginning of the text span to be added. The attribute n with the value 2 says that the text that follows is to be added to the second edited version of this document, but is not present in the first version. The other attribute spanTo serves to indicate where the added text ends. The value of this attribute is a pointer to the identifier of another element. The "#" means that this attribute points to something else and the "A1" is the identifier pointed to. This identifier is defined on the element <anchor>, which is used to mark the end of the stretch of text to be added. In this case, the <anchor> element occurs after the list to be added. It carries the attribute xml:id with the value A1.
4.1.4.2. Substitutions
There are two kinds of substitutions. In the first case, something is deleted and replaced with something else. In the second case, an alternative is added without deleting the first option.
An example of the first case (something is deleted and replaced) can be seen in the following facsimile of the list BNP 143 6r:
In the fourth item of the list, the word "large" is overtyped and replaced with the word "big", which is put above the old word. This list item is encoded in the following way:
The word that is deleted ("large") is marked up with the element <del>. It has the attribute rend with the value overtyped, because the word is deleted by typing some "xxx" over it. The attribute n with the value 1 says that the word "large" belongs to the first version of this document (before it was deleted). The word that is added instead ("big") is encoded with the element <add>. Where the new word is added is indicated in the attribute place, which has the value above here. Also, the addition carries the attribute n with the value 2, saying that this addition belongs to the second, final version of the document. Both the deletion and the addition are surrounded by an element <subst>, indicating that this is a substitution.
Another example of substitutions can be seen in the facsimile of the list BNP 120-23r:
In the fourth list item, there are two kinds of substitutions. The first one is that the letter "A" is overwritten with the letter "O", and the second one that the word "Agua" is struck through and the word "Segredo" added above it to replace it. This list item is encoded in the following way:
Both substitutions are marked up with the element <subst> containing an element <del> for the deleted part and and element <add> for the added part. In both cases, the deleted words are marked as belonging to the first edited version of the document (n with 1) while the added words are part of the second version (n with 2). The first deletion is rendered as rendoverwritten and the second as overstrike. In the first substitution, the new letter is added directly on top of the deleted one, so the element <add> needs no additional attribute place saying where the addition was made. In the second substitution, the addition was made above the previous word, so <add> has an attribute place with the value above.
An example of the second case (something is replaced without deleting the first option) can be seen in the following facsimile:
Here, in the list item with the number 6, there is the title "Le Gardien des Troupeaux", to which an alternative is added resulting in "Le Gardien de Troupeaux". The encoding of this alternative is shown in the following:
Because the change is actually only applied to the word "des", the element <choice> is only used on this word, more specifically on its last two characters "es", which are changed just to "e". The element <choice> contains two segments, <seg> 1 and <seg> 2, one for each version of the word. The first version is marked with n = 1 and the last version with n = 2. Furthermore, the last version is encoded as an addition using <add> inside of the second segment. Also, the place of the addition is indicated in place, which has the value above.
4.1.4.3. Transpositions
A transposition means that a passage of text should be moved to another position, but the result of this process is not visible in the document. Instead, some metamark (e.g. an arrow, a line, or numbers) indicates which elements should be transposed (see the TEI guidelines for more information). In this edition, the metamark indicating the transposition is included in the diplomatic transcription. In the first edited version of the text, the passages are shown as they were originally and in the second edited version, the result of the transposition is given. An example of a transposition can be found in the document BNP/E3 93-56r, as shown in the following facsimile:
On the lower part of the page, there is a list with five items. The second list item has three lines, of which the second one has a transposition. A line indicates that the two words "poemas bons" should be transposed to "bons poemas". In TEI, this is encoded as follows:
The element <metamark> with the attribute function and its value transposition is used to represent the sign that indicates the transposition, in this case a line (so that the attribute rend has the value arrow). The attribute place indicates where the metamark is placed in relationship to the elements that should be transposed. Here the value above is given, although the line actually starts above the first word and ends below the second, so this is a simplification. As a rule of thumb, the place where the metamark starts should be indicated in the place attribute. The attribute target of the <metamark> element serves to point to the elements which should be transposed. The pointers are the values of the identifiers of those elements, preceded by the sign '#' and separated by a space. In this case, the two elements with the identifiers S1 and S2 should be transposed. The order of the two identifiers is the one that the elements have in their original position (so "poemas" = S1 before "bons" = S2). Finally, the attribute n with the value 2 means that the transposition should only be realized in the second version of the text. The two elements that should be transposed directly follow the <metamark> element. Here, these are to <seg> elements, one for each word, and they have the identifiers S1 and S2 as values of the attribute xml:id.
Another example of a transposition can be found in the document CP 786. In that case not two words are transposed but two rows of a table, as can be seen in the following facsimile:
Here the second and third rows with the text "Spell" and "Carta ao Author de Sachá" should be transposed. In TEI, this is encoded as follows:
As in the case of the example with two words, also for the two table rows the element <metamark> with an attribute function and its value transposition holds the sign that indicates that the two table rows should be transposed. Also here, it is a curved line starting at the beginning of the word "Spell" and ending at "Author" on the next line. To simplify this, the attribute place of <metamark> has the value left, which means that the sign is placed on the left side of the table rows. What should be transposed is indicated in the attribute target, by giving the identifiers of the corresponding elements, in this case the identifiers of the two table rows (R1 and R2). The two <row> elements that should be transposed directly follow the <metamark> element.
4.1.4.4. Deletions
Deletions can be sections of text that are visibly struck through or typed over by Pessoa. The following facsimile contains an example of a deletion:
On the right page, below the heading "Um grande poeta materialista (Alberto Caeiro)", there is a phrase "A enthusiastica all" which is struck through. This is encoded as follows:
To mark deletions, the element <del> is used in combination with the attribute rend, which here has the value overstrike. Other possible values are overtyped (when the document is not handwritten but typed) and overwritten (when the text is overwritten with new text instead of using a line to strike it through). In the current example, the deletion is interpreted as already belonging to the first version of the document. It does therefore not have an attribute n with a value 2, which would mark that the deletion was made only for the final version of the document.
4.1.4.4.1. Deletion of longer passages of text
In some cases not just a few characters or words are deleted, but longer passages of text, for example several list items or a whole list. This is the case in the list BNP 144A-37v, as can be seen in the following image:
Here, the whole list is deleted. This cannot be encoded with the element <del> because that element is not allowed to hold entire lists. The solution is shown in the following code example:
The element <delSpan> is used to mark the beginning of the passage that should be deleted. It is an empty element which closes directly. How the deletion should be rendered is indicated in the attribute rend, which has the value overstrike here. The element <delSpan> has another attribute spanTo which points to another element with the identifier "A2". That it is a pointer is marked with the sign "#", so the value of the attribute is #A2. This other element servers to mark the end of the deleted passage. It is encoded with the element <anchor> and has the attribute xml:id with the value A2. The <anchor> element is also empty.
4.1.4.5. Other modifications
In some cases, it is not text that is added, but graphical elements. For example, words can be modified by underlining them or drawing a circle around them. An example is shown in the following facsimile of the list BNP 133M-30r:
On this list, the text "(Advertise for Cipher Agency - America)." is highlighted by a frame which Pessoa added later.
The TEI element <add> is not suitable to encode such modifications. Instead, the more general element <mod> is used, as in the following code snippet:
The element <mod> surrounds the text to be framed. The attribute rend indicates how it should be modified, here by adding a frame (framed). The attribute n with the value 2 indicates that the modification was only done later and should be part of the second edited version of the document.
4.1.5. Editorial interventions
For some aspects of the text on the documents, the editor may decide to give more information on the transcribed text, for example to indicate how abbreviations would be expanded, that there is a gap in the text and how it could be filled. It should also be marked if the editor decides to only transcribe some part of the document, but not the whole one.
4.1.5.1. Expansion of abbreviations
The following example shows how abbreviations are encoded in the documents and how they can be expanded:
<item> — 2
- idéas para o <rs type="title">
- <choice>
- <abbr>L<am>.</am> do
- Des<am>.</am>
- </abbr>
- <expan>L<ex>ivro</ex> do
- Des<ex>asocego</ex>
- </expan>
- </choice>
- </rs>
-</item>
Here, there is a list item containing the title "Livro do Desasocego" in abbreviated form: "L. do Des.". To mark the difference between the abbreviated and expanded form, the element <choice> is used. Inside of it, first, the the abbreviated text is given in an element <abbr>. Inside of it, the text is transcribed as it appears on the document. The abbreviation signs, in this case dots, are marked up further with the element <am> ("abbreviation mark"). The expansion of the abbreviation is given in the element <expan>. Inside of this, the parts where abbreviation marks are replaced by text are given in elements <ex>. Otherwise the text of the abbreviation is repeated in the expansion, because the <choice> element says that just one of the versions will be displayed at a time.
4.1.5.1.1. Ditto
A special case of abbreviation expansion are subsequent items in a list that contain repetitions for which Pessoa used typographical marks as placeholders.
In the following facsimile, it can be seen that the sixth list item starts with a line:
This line indicates that the beginning of this list item corresponds with the beginning of the previous item number 5, i. e. "Trad. de...". The line is therefore interpreted as an abbreviation, which can be expanded to the text of the preceding list item. This is encoded as follows:
The line itself is encoded with the element <metamark>, using the attribute rend with the value line and the attribute function with the value ditto. The line is then marked as an abbreviation using <abbr>. It is expanded by using the element <choice> wrapped around the abbreviation and adding the element <expan> to include the text and mark-up that the line stands for: "Traducção de Alberto Caeiro".
Another example for "ditto" using quotation marks instead of lines is visible in the following page of the list BNP 133M-96-a-98:
In item 14, there are two subitems: 'Small book on Sh. - Bacon' and 'Larger " " " "', where the quotation marks are placeholders for the text of the previous item. This is encoded as follows:
The second item of the sublist contains an element <choice> with the child elements <abbr> and <expan>. The abbreviation holds the quotation marks with the function "ditto". Each quotatation mark is encoded as and <metamark> with the attributes rend with the value quotes and function with the value ditto. The expansion in the element <expan> then contains the repeated text that the quotation marks stand for.
4.1.5.2. Selections
In some cases not the whole content of a document is relevant for the edition, but only a certain part of it. Then the element <gap> can be used to mark such selections:
In the example, the list is transcribed up to the seventh item. On the document, there is more text below the list, but it was decided not to transcribe it. The element <gap> indicates that something was left out here. In the attribute reason, it is mentioned that the gap is due to selection (and not, for example, because the document is damaged or the text illegible). Also the extent of what was not transcribed should be indicated. This can be done using the attribute unit in combination with the attribute extent. The first one says what is counted and the latter how much of it was selected. In the example, the remaining lines were counted and three lines were not transcribed. Possible values for unit are character, word, and line. Possible values for extent are numbers.
4.1.5.3. Conjectural readings
Sometimes the editor is not sure how a passage should be read, but still wants to make a suggestion. Such conjectural readings are marked with the element <unclear>, as in the following example:
Here the word "Books" could not be read with certainty. The attribute reason serves to explain why something was unclear, in this case because the word was illegible.
4.1.5.4. Gaps
One sort of gaps is when some text in a document is present, but could not be read by the editor. See for example the following facsimile of the document 144D2 9r:
At the end of the fourth list item, there is some text in parenthesis, beginning with "Fraça, Barrès.", but the third word, which was struck through, could not be read. It is therefore marked as a gap and encoded as follows:
The word that could not be read is not transcribed. Instead an element <gap> is added at the position of the illegible word. The <gap> element gets an attribute reason stating why there is a gap, in this case because the word is illegible. What and how much is illegible is indicated in the other two attributes: unit stating that it is a word and extent stating that just 1 word could not be read. In this specific example, the illegible word is also struck through. This is marked up by adding a <del> element around the <gap> element, with an attribute rend with the value overstrike.
Another kind of gap is, when the editor may wish to indicate that at some points in a document, some text is expected but is missing because the document was not finished or because the text was left out on purpose. An example of such a case is shown in the following:
Here there is a list item containing the name of a periodical and a comment "ver se se obtem Santos-Vieira". An addition is made below this list item: "(pelo lado anti-clerical )". Because there is space between the last word of the addition and the closing parenthesis, the editor assumes that there should be more text. To mark this, the element <supplied> is used. It carries two attributes, the first one, resp, serves to indicate who made this intervention. As a value, it takes the initials of the responsible editor, in this case "PS" for "Pedro Sepúlveda". The second attribute is reason, explaining why something is supplied. Here it has the value omitted-in-original. In this example, the element <supplied> is empty because the editor does not know what the missing text is. In other cases, it is possible that the element <supplied> contains the text that is supposed to be there.
In another case, there is a list starting with a heading "Italian:", but no list items are added on the document. This is encoded by using the element <supplied> inside of an otherwise empty list item, as in the following encoding example:
The transcription of publications is encoded within the TEI elements <text> and <body>, each appearing twice. In the first <text> element, you'll find the attributes corresp and type. In the second <text>, only the attribute type is present:
The attribute corresp contains a unique key-value defined in an external central work register (see the section on Work index for more details). However, in the attribute type, you will consistently find the value "orig". This indicates that the first <text> element contains a transcription of the spelling and formatting that follows the source of the publication and has not been normalized or corrected. In contrast, the second <text> element contains the attribute type, but with the value "reg". This signifies that it represents the current spelling of the published text.
4.2.2. Structures inside of divisions: headings, paragraphs, ...
Similar to document encoding, there are other structures within the main sections of a puplication, such as headings, paragraphs, or verse lines of verse that are encoded.
<body>
- <p>A quadra é o vaso de flores que o povo põe á janela da sua
- Alma.</p>
- <p rend="indent-first">Da orbita triste do vaso obscuro a graça
- exilada das flôres atreve o seu olhar de alegria.</p>
- <p rend="indent-first">Quem faz quadras portuguezas comunga a
- alma do Povo, humildemente de nós todos e errante dentro de
- si propria.</p>
- <p rend="indent-first">Os autores d'este livro realizaram as
- suas quadras com destreza luzitana e fidelidade ao
- instinctivo e desatado da alma popular.</p>
- <p rend="indent-first">Elogial-os mais seria elogial-os
- menos.</p>
- <p>17-IV-1914</p> [...]
-</body>
<lg>
- <l>E a orla branca foi de ilha em continente,</l>
- <l>Clareou, correndo, até ao fim do mundo,</l>
- <l>E viu-se a terra inteira, de repente,</l>
- <l>Surgir, redonda, do azul profundo.</l>
-</lg>
-<lg>
- <l>Quem te sagrou creou-te portuguez.</l>
- <l>Do mar e nós em ti nos deu signal.</l>
- <l>Cumpriu-se o Mar, e o Imperio se desfez.</l>
- <l>Senhor, falta cumprir-se Portugal!</l>
-</lg>
4.3. Encoding of references (names, titles, periodicals, works, ...)
In the edition, references to several kinds of entities are encoded: to names, titles, periodicals, works, and collections. For all of these references, the element <rs> is used. In the attribute type, the kind of reference is given. This attribute can have the values name, title, periodical, work, or collection. The following example shows a heading of a document that is at the same time a reference to a title, which itself is the title of a collection of works:
<head>
- <hi rend="underline">
- <rs type="collection" key="C16">
- <rs type="title" key="T184">Na Casa
- de saude de Cascaes</rs>
- </rs>
- </hi>
-</head> [...]
Therefore, the text "Na Casa de saude de Cascaes" is wrapped with two <rs> elements, one to say that it is a reference to a title and the other to state that the title is a reference to a collection of works. In both cases, also the key attribute is used. It serves to identify the entity which is referenced. Each type of entity has an own type of key. Titles, for example, have keys beginning with "T", followed by a number. Collections have a key beginning with "C". Names have keys beginning with "P" (for person name), periodicals with "J" (for journal) and works with "W". Possible values for the keys are to be found in external lists of the entities. A special case are references to the main heteronyms. Although these are references to names, the keys do not begin with "P" in these cases (as for all other person names), but specific keys for the heteronyms are used ("FP", "AC", "AdC", "RR", "BS").
In the next example, a reference to a work title is given:
Here, the title is a name of a heteronym ("Alberto Caeiro"), used as a placeholder for the work of this heteronym. First the reference to the name is marked using the element <rs> with the attribute type and the value name. The key for Alberto Caeiro is AC.
Also, it is important to note that references to the main heteronyms (Fernando Pessoa, Alberto Caeiro, Álvaro de Campos, Ricardo Reis, and Bernardo Soares) are further marked up by indicating the role that they have in the reference. In the example, Alberto Caeiro is mentioned as an author, so the attribute role is used on <rs> with the value author. The available values for role are: author, editor, translator, and topic.
Next, the reference to the title is marked with <rs> and type with the value title and a key giving the title idenfier T48. Then another element <rs> is used around the first one to indicate that this is a work reference (with typework) and to which work (with keyW32).
4.3.1. Roles of name references
References to the main heteronyms (Fernando Pessoa, Alberto Caeiro, Álvaro de Campos, Ricardo Reis, and Bernardo Soares) are further marked up by indicating the role that they have in the reference. For this purpose, the attribute role is used on <rs>. The available values for role are: author, editor, translator, and topic. Most often, names are mentioned as authors, but there are also cases where a name is mentioned as part of a topic, as in the following example:
<head>
- <rs type="title">
- <hi rend="underline">Vida e obras do engenheiro<lb/>
- <rs type="name" key="AdC" role="topic">Alvaro de
- Campos</rs>
- </hi>
- </rs>.
-</head>
Here, Álvaro de Campos occurs inside of a title reference and is the topic of the work.
In the next example, a heteronym is mentioned in the role of editor:
<head>
- <hi rend="underline">
- <rs type="title">Livro do
- Desassocego</rs>.</hi>
-</head>
-<ab>escripto por <rs type="name" key="P62">Vicente Guedes,<lb/>
- </rs> publicado por <rs type="name" key="FP" role="editor"
- style="b">Fernando<lb/>
- Pessoa</rs>.</ab>
Here Fernando Pessoa is mentioned as the editor of "Livro do Desassocego" and the reference to his name is therefore marked with roleeditor. As the author, Vicente Guedes is mentioned, but as this is not one of the main heteronyms, no role is indicated in the name reference in that case.
4.3.2. Styles of name references
Another aspect of the encoding of rerences is the "style" of the reference. "Style" means that a certain way of spelling a reference is used. In the following example, the name "Antonio Mora" is given without any accent. This is marked as "style b", using the attribute style on the element <rs>, giving it the value b. In the case of Antonio Mora, the style a is "Antonio Móra" with an accent.
For which names there are different styles available is defined in the external list of names. If a name has different styles, the attribute style should always be used in the encoding to indicate which of the styles is used in the document that is transcribed.
4.4. Encoding of links
Besides references to different kinds of defined entities (such as persons, journals, etc.) also general links can be added to the transcriptions of documents and publications. Such links can serve to interconnect different parts of the edition, without the necessity to explicitly define the kind of relationship between the source and the target(s) of the link. They can also be used to point to external resources. That way, the links are a means of interpretation and comment on the transcriptions made by the editors.
Examples of the encoding of links are given below, taken from the editorial list BNP/E3 144X-48v:
In the seventh item of the list, an article to be published in the journal "A Galera" is mentioned. The mention of this article is linked to the text "Para a memoria de Antonio Nobre", which Pessoa published in "A Galera" in 1915, and which is also part of the digital edition. That way, the link implies that the published text is a realization of the planned article that was mentioned in the editorial list. A link is encoded using the element <ref>, which surrounds the text that carries the link and the attribute target on the <ref> element, which contains the target of the link in the form of a URI. In the above example the link has only one target, but it is also possible that several targets are defined at the same time, as the following example shows:
Here, the sixth item of the editorial list mentions articles ("artigos") to be published in "O Jornal". The link surrounding this mention has several targets that correspond to various articles published in the journal in question, which are included in the digital edition. Several targets are given as several URIs in the target attribute, separated by a space.
Appendix A TEI Specifications
This TEI Customization uses the modules core, tei, header, textstructure, msdescription, transcr, analysis, linking, figures and certainty.
Appendix A.1 Elements
Appendix A.1.1 <TEI>
<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure15.1. Varieties of Composite Text]
This element is required. It is customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, for example: <TEI version="4.4.0" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">.
Example
<TEI version="3.3.0" xmlns="http://www.tei-c.org/ns/1.0">
- <teiHeader>
- <fileDesc>
- <titleStmt>
- <title>The shortest TEI Document Imaginable</title>
- </titleStmt>
- <publicationStmt>
- <p>First published as part of TEI P2, this is the P5
- version using a namespace.</p>
- </publicationStmt>
- <sourceDesc>
- <p>No source: this is an original work.</p>
- </sourceDesc>
- </fileDesc>
- </teiHeader>
- <text>
- <body>
- <p>This is about the shortest TEI document imaginable.</p>
- </body>
- </text>
-</TEI>
<ab> (anonymous block) contains any component-level unit of text, acting as a container for phrase or inter level elements analogous to, but without the same constraints as, a paragraph. [16.3. Blocks, Segments, and Anchors]
The <ab> element may be used at the encoder's discretion to mark any component-level elements in a text for which no other more specific appropriate markup is defined. Unlike paragraphs, <ab> may nest and may use the type and subtype attributes.
Example
<div type="book" n="Genesis">
- <div type="chapter" n="1">
- <ab>In the beginning God created the heaven and the earth.</ab>
- <ab>And the earth was without form, and void; and
- darkness was upon the face of the deep. And the
- spirit of God moved upon the face of the waters.</ab>
- <ab>And God said, Let there be light: and there was light.</ab>
-<!-- ...-->
- </div>
-</div>
Schematron
-<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not( ancestor::tei:floatingText
- |parent::tei:figure |parent::tei:note )"> Abstract model violation: Lines may not contain higher-level divisions such as p or ab, unless ab is a child of figure or note, or is a descendant of floatingText.
-</sch:report>
<add> (addition) contains letters, words, or phrases inserted in the source text by an author, scribe, or a previous annotator or corrector. [3.5.3. Additions, Deletions, and Omissions]
Module
core
Attributes
n
(number) gives a number (or other label) for an element, which is not necessarily unique within the document.
In a diplomatic edition attempting to represent an original source, the <add> element should not be used for additions to the current TEI electronic edition made by editors or encoders. In these cases, either the <corr> or <supplied> element are recommended.
In a TEI edition of a historical text with previous editorial emendations in which such additions or reconstructions are considered part of the source text, the use of <add> may be appropriate, dependent on the editorial philosophy of the project.
Example
The story I am
- going to relate is true as to its main facts, and as to the
- consequences <add place="above">of these facts</add> from which
- this tale takes its title.
<addSpan> (added span of text) marks the beginning of a longer sequence of text added by an author, scribe, annotator or corrector (see also <add>). [11.3.1.4. Additions and Deletions]
Both the beginning and the end of the added material must be marked; the beginning by the <addSpan> element itself, the end by the spanTo attribute.
Example
<handNote xml:id="HEOL"
- scribe="HelgiÓlafsson"/>
-<!-- ... -->
-<body>
- <div>
-<!-- text here -->
- </div>
- <addSpan n="added_gathering" hand="#HEOL"
- spanTo="#P025"/>
- <div>
-<!-- text of first added poem here -->
- </div>
- <div>
-<!-- text of second added poem here -->
- </div>
- <div>
-<!-- text of third added poem here -->
- </div>
- <div>
-<!-- text of fourth added poem here -->
- </div>
- <anchor xml:id="P025"/>
- <div>
-<!-- more text here -->
- </div>
-</body>
Schematron
-<sch:assert test="@spanTo">The @spanTo attribute of <sch:name/> is required.</sch:assert>
Schematron
-<sch:assert test="@spanTo">L'attribut spanTo est requis.</sch:assert>
<am> (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation. [11.3.1.2. Abbreviation and Expansion]
-element am { ( text | tei_model.gLike | tei_model.pPart.transcriptional )* }⚓
Appendix A.1.7 <analytic>
<analytic> (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication. [3.12.2.1. Analytic, Monographic, and Series Levels]
On this element, the global xml:id attribute must be supplied to specify an identifier for the point at which this element occurs within a document. The value used may be chosen freely provided that it is unique within the document and is a syntactically valid name. There is no requirement for values containing numbers to be in sequence.
Example
<s>The anchor is he<anchor xml:id="A234"/>re somewhere.</s>
-<s>Help me find it.<ptr target="#A234"/>
-</s>
intent is to assess the target resource in some way, rather than simply make a comment about it
bookmarking
intent is to create a bookmark to the target or part thereof
classifying
intent is to classify the target in some way
commenting
intent is to comment about the target
describing
intent is to describe the target, rather than (for example) comment on it
editing
intent is to request an edit or a change to the target resource
highlighting
intent is to highlight the target resource or a segment thereof
identifying
intent is to assign an identity to the target
linking
intent is to link to a resource related to the target
moderating
intent is to assign some value or quality to the target
questioning
intent is to ask a question about the target
replying
intent is to reply to a previous statement, either an annotation or another resource
tagging
intent is to associate a tag with the target
Note
For further detailed explanation of the suggested values, see the Web Annotation Vocabulary (WAV). The motivations described here map to URIs defined by the WAV and when exported to RDF or JSON-LD must have the URI http://www.w3.org/ns/oa# prepended.
As an RDF vocabulary, WADM permits the definition of new motivations (see Appendix C of the WAV). In TEI, new motivations may be defined in a custom ODD (see section 23.3.1.3). New motivations must also map to URIs defined by an RDF ontology extending the WAV.
<annotation xml:id="ann1"
- motivation="linking" target="#Gallia">
-<!-- See https://www.w3.org/TR/annotation-model/#lifecycle-information and
- https://www.w3.org/TR/annotation-model/#agents -->
- <respStmt xml:id="fred">
- <resp>creator</resp>
- <persName>Fred Editor</persName>
- </respStmt>
- <revisionDesc>
- <change status="created"
- when="2020-05-21T13:59:00Z" who="#fred"/>
- <change status="modified"
- when="2020-05-21T19:48:00Z" who="#fred"/>
- </revisionDesc>
-<!-- See https://www.w3.org/TR/annotation-model/#rights-information -->
- <licence target="http://creativecommons.org/licenses/by/4.0/"/>
-<!-- Multiple bodies -->
-<!-- Pointers to sections of text in the same document -->
- <ptr target="#string-range(c1p1s1,0,6)"/>
- <ptr target="#string-range(c1p1s6,19,7)"/>
-</annotation>
Example
<annotation xml:id="TheCorrectTitle"
- motivation="commenting" target="#line1">
- <note>The correct title of this specification, and the correct full name of XML, is
- "Extensible Markup Language". "eXtensible Markup Language" is just a spelling error.
- However, the abbreviation "XML" is not only correct but, appearing as it does in the title
- of the specification, an official name of the Extensible Markup Language. </note>
-</annotation>
<app> (apparatus entry) contains one entry in a critical apparatus, with an optional lemma and usually one or more readings or notes on the relevant passage. [12.1.1. The Apparatus Entry]
<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. [3.12.2.2. Titles, Authors, and Editors2.2.1. The Title Statement]
Particularly where cataloguing is likely to be based on the content of the header, it is advisable to use a generally recognized name authority file to supply the content for this element. The attributes key or ref may also be used to reference canonical information about the author(s) intended from any appropriate authority, such as a library catalogue or online resource.
In the case of a broadcast, use this element for the name of the company or network responsible for making the broadcast.
Where an author is unknown or unspecified, this element may contain text such as Unknown or Anonymous. When the appropriate TEI modules are in use, it may also contain detailed tagging of the names used for people, organizations or places, in particular where multiple names are given.
Example
<author>British Broadcasting Corporation</author>
-<author>La Fayette, Marie Madeleine Pioche de la Vergne, comtesse de (1634–1693)</author>
-<author>Anonymous</author>
-<author>Bill and Melinda Gates Foundation</author>
-<author>
- <persName>Beaumont, Francis</persName> and
-<persName>John Fletcher</persName>
-</author>
-<author>
- <orgName key="BBC">British Broadcasting
- Corporation</orgName>: Radio 3 Network
-</author>
<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.]
<availability status="restricted">
- <p>Available for academic research purposes only.</p>
-</availability>
-<availability status="free">
- <p>In the public domain</p>
-</availability>
-<availability status="restricted">
- <p>Available under licence from the publishers.</p>
-</availability>
Example
<availability>
- <licence target="http://opensource.org/licenses/MIT">
- <p>The MIT License
- applies to this document.</p>
- <p>Copyright (C) 2011 by The University of Victoria</p>
- <p>Permission is hereby granted, free of charge, to any person obtaining a copy
- of this software and associated documentation files (the "Software"), to deal
- in the Software without restriction, including without limitation the rights
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- copies of the Software, and to permit persons to whom the Software is
- furnished to do so, subject to the following conditions:</p>
- <p>The above copyright notice and this permission notice shall be included in
- all copies or substantial portions of the Software.</p>
- <p>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- THE SOFTWARE.</p>
- </licence>
-</availability>
Because cultural conventions differ as to which elements are grouped as back matter and which as front matter, the content models for the <back> and <front> elements are identical.
Example
<back>
- <div type="appendix">
- <head>The Golden Dream or, the Ingenuous Confession</head>
- <p>TO shew the Depravity of human Nature, and how apt the Mind is to be misled by Trinkets
- and false Appearances, Mrs. Two-Shoes does acknowledge, that after she became rich, she
- had like to have been, too fond of Money
-<!-- .... -->
- </p>
- </div>
-<!-- ... -->
- <div type="epistle">
- <head>A letter from the Printer, which he desires may be inserted</head>
- <salute>Sir.</salute>
- <p>I have done with your Copy, so you may return it to the Vatican, if you please;
-
-<!-- ... -->
- </p>
- </div>
- <div type="advert">
- <head>The Books usually read by the Scholars of Mrs Two-Shoes are these and are sold at Mr
- Newbery's at the Bible and Sun in St Paul's Church-yard.</head>
- <list>
- <item n="1">The Christmas Box, Price 1d.</item>
- <item n="2">The History of Giles Gingerbread, 1d.</item>
-<!-- ... -->
- <item n="42">A Curious Collection of Travels, selected from the Writers of all Nations,
- 10 Vol, Pr. bound 1l.</item>
- </list>
- </div>
- <div type="advert">
- <head>By the KING's Royal Patent, Are sold by J. NEWBERY, at the Bible and Sun in St.
- Paul's Church-Yard.</head>
- <list>
- <item n="1">Dr. James's Powders for Fevers, the Small-Pox, Measles, Colds, &c. 2s.
- 6d</item>
- <item n="2">Dr. Hooper's Female Pills, 1s.</item>
-<!-- ... -->
- </list>
- </div>
-</back>
Contains phrase-level elements, together with any combination of elements from the model.biblPart class
Example
<bibl>Blain, Clements and Grundy: Feminist Companion to Literature in English (Yale,
- 1990)</bibl>
Example
<bibl>
- <title level="a">The Interesting story of the Children in the Wood</title>. In
-<author>Victor E Neuberg</author>, <title>The Penny Histories</title>.
-<publisher>OUP</publisher>
- <date>1968</date>.
-</bibl>
Example
<bibl type="article" subtype="book_chapter"
- xml:id="carlin_2003">
- <author>
- <name>
- <surname>Carlin</surname>
- (<forename>Claire</forename>)</name>
- </author>,
-<title level="a">The Staging of Impotence : France’s last
- congrès</title> dans
-<bibl type="monogr">
- <title level="m">Theatrum mundi : studies in honor of Ronald W.
- Tobin</title>, éd.
- <editor>
- <name>
- <forename>Claire</forename>
- <surname>Carlin</surname>
- </name>
- </editor> et
- <editor>
- <name>
- <forename>Kathleen</forename>
- <surname>Wine</surname>
- </name>
- </editor>,
- <pubPlace>Charlottesville, Va.</pubPlace>,
- <publisher>Rookwood Press</publisher>,
- <date when="2003">2003</date>.
- </bibl>
-</bibl>
<biblScope> (scope of bibliographic reference) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work. [3.12.2.5. Scopes and Ranges in Bibliographic Citations]
When a single page is being cited, use the from and to attributes with an identical value. When no clear endpoint is provided, the from attribute may be used without to; for example a citation such as ‘p. 3ff’ might be encoded <biblScope from="3">p. 3ff</biblScope>.
It is now considered good practice to supply this element as a sibling (rather than a child) of <imprint>, since it supplies information which does not constitute part of the imprint.
<byline> (byline) contains the primary statement of responsibility given for a work on its title page or at the head or end of the work. [4.2.2. Openers and Closers4.5. Front Matter]
The byline on a title page may include either the name or a description for the document's author. Where the name is included, it may optionally be tagged using the <docAuthor> element.
Example
<byline>Written by a CITIZEN who continued all the
- while in London. Never made publick before.</byline>
Example
<byline>Written from her own MEMORANDUMS</byline>
Example
<byline>By George Jones, Political Editor, in Washington</byline>
Example
<byline>BY
-<docAuthor>THOMAS PHILIPOTT,</docAuthor>
- Master of Arts,
- (Somtimes)
- Of Clare-Hall in Cambridge.</byline>
<row>
- <cell role="label">General conduct</cell>
- <cell role="data">Not satisfactory, on account of his great unpunctuality
- and inattention to duties</cell>
-</row>
characterizes the element in some sense, using any convenient classification scheme or typology; sample categorization of annotations of uncertainty might use following values:
indicates more exactly the aspect concerning which certainty is being expressed: specifically, whether the markup is correctly located, whether the correct element or attribute name has been used, or whether the content of the element or attribute is correct, etc.
Because the children of a <choice> element all represent alternative ways of encoding the same sequence, it is natural to think of them as mutually exclusive. However, there may be cases where a full representation of a text requires the alternative encodings to be considered as parallel.
Where the purpose of an encoding is to record multiple witnesses of a single work, rather than to identify multiple possible encoding decisions at a given point, the <app> element and associated elements discussed in section 12.1. The Apparatus Entry, Readings, and Witnesses should be preferred.
Example
An American encoding of Gulliver's Travels which retains the British spelling but also provides a version regularized to American spelling might be encoded as follows.
<p>Lastly, That, upon his solemn oath to observe all the above
- articles, the said man-mountain shall have a daily allowance of
- meat and drink sufficient for the support of <choice>
- <sic>1724</sic>
- <corr>1728</corr>
- </choice> of our subjects,
- with free access to our royal person, and other marks of our
-<choice>
- <orig>favour</orig>
- <reg>favor</reg>
- </choice>.</p>
<cit> (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example. [3.3.3. Quotation4.3.1. Grouped Texts9.3.5.1. Examples]
<cit>
- <quote>and the breath of the whale is frequently attended with such an insupportable smell,
- as to bring on disorder of the brain.</quote>
- <bibl>Ulloa's South America</bibl>
-</cit>
Example
<entry>
- <form>
- <orth>horrifier</orth>
- </form>
- <cit type="translation" xml:lang="en">
- <quote>to horrify</quote>
- </cit>
- <cit type="example">
- <quote>elle était horrifiée par la dépense</quote>
- <cit type="translation" xml:lang="en">
- <quote>she was horrified at the expense.</quote>
- </cit>
- </cit>
-</entry>
Example
<cit type="example">
- <quote xml:lang="mix">Ka'an yu tsa'a Pedro.</quote>
- <media url="soundfiles-gen:S_speak_1s_on_behalf_of_Pedro_01_02_03_TS.wav"
- mimeType="audio/wav"/>
- <cit type="translation">
- <quote xml:lang="en">I'm speaking on behalf of Pedro.</quote>
- </cit>
- <cit type="translation">
- <quote xml:lang="es">Estoy hablando de parte de Pedro.</quote>
- </cit>
-</cit>
(match) supplies an XPath selection pattern using the syntax defined in [[undefined XSLT3]] which identifies a set of nodes which are citable structural components. The expression may be absolute (beginning with /) or relative. match on a <citeStructure> without a <citeStructure> parent must be an absolute XPath. If it is relative, its context is set by the match of the parent <citeStructure>.
-<sch:rule context="tei:citeStructure[not(parent::tei:citeStructure)]">
-<sch:assert test="starts-with(@match,'/')">An XPath in @match on the outer <sch:name/> must start with '/'.</sch:assert>
-</sch:rule>
Schematron
-<sch:rule context="tei:citeStructure[parent::tei:citeStructure]">
-<sch:assert test="not(starts-with(@match,'/'))">An XPath in @match must not start with '/' except on the outer <sch:name/>.</sch:assert>
-</sch:rule>
unit
(unit) describes the structural unit indicated by the <citeStructure>.
<div type="letter">
- <p> perhaps you will favour me with a sight of it when convenient.</p>
- <closer>
- <salute>I remain, &c. &c.</salute>
- <signed>H. Colburn</signed>
- </closer>
-</div>
Example
<div type="chapter">
- <p>
-<!-- ... --> and his heart was going like mad and yes I said yes I will Yes.</p>
- <closer>
- <dateline>
- <name type="place">Trieste-Zürich-Paris,</name>
- <date>1914–1921</date>
- </dateline>
- </closer>
-</div>
The conversion element is designed to store information about converting from one unit of measurement to another. The formula attribute holds an XPath expression that indicates how the measurement system in fromUnit is converted to the system in toUnit. Do not confuse the usage of the dating attributes (from and to) in the examples with the attributes (fromUnit and toUnit) designed to reference units of measure.
If all that is desired is to call attention to the fact that the copy text has been corrected, <corr> may be used alone:
I don't know,
- Juan. It's so far in the past now — how <corr>can we</corr> prove
- or disprove anyone's theories?
Example
It is also possible, using the <choice> and <sic> elements, to provide an uncorrected reading:
I don't know, Juan. It's so far in the past now —
- how <choice>
- <sic>we can</sic>
- <corr>can we</corr>
-</choice> prove or
- disprove anyone's theories?
Given on the <date when="1977-06-12">Twelfth Day
- of June in the Year of Our Lord One Thousand Nine Hundred and Seventy-seven of the Republic
- the Two Hundredth and first and of the University the Eighty-Sixth.</date>
<dateline> (dateline) contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer. [4.2.2. Openers and Closers]
<dateline>Walden, this 29. of August 1592</dateline>
Example
<div type="chapter">
- <p>
-<!-- ... --> and his heart was going like mad and yes I said yes I will Yes.</p>
- <closer>
- <dateline>
- <name type="place">Trieste-Zürich-Paris,</name>
- <date>1914–1921</date>
- </dateline>
- </closer>
-</div>
<del> (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, or a previous annotator or corrector. [3.5.3. Additions, Deletions, and Omissions]
Module
core
Attributes
n
(number) gives a number (or other label) for an element, which is not necessarily unique within the document.
This element should be used for deletion of shorter sequences of text, typically single words or phrases. The <delSpan> element should be used for longer sequences of text, for those containing structural subdivisions, and for those containing overlapping additions and deletions.
The text deleted must be at least partially legible in order for the encoder to be able to transcribe it (unless it is restored in a <supplied> tag). Illegible or lost text within a deletion may be marked using the <gap> tag to signal that text is present but has not been transcribed, or is no longer visible. Attributes on the <gap> element may be used to indicate how much text is omitted, the reason for omitting it, etc. If text is not fully legible, the <unclear> element (available when using the additional tagset for transcription of primary sources) should be used to signal the areas of text which cannot be read with confidence in a similar way.
There is a clear distinction in the TEI between <del> and <surplus> on the one hand and <gap> or <unclear> on the other. <del> indicates a deletion present in the source being transcribed, which states the author's or a later scribe's intent to cancel or remove text. <surplus> indicates material present in the source being transcribed which should have been so deleted, but which is not in fact. <gap> or <unclear>, by contrast, signal an editor's or encoder's decision to omit something or their inability to read the source text. See sections 11.3.1.7. Text Omitted from or Supplied in the Transcription and 11.3.3.2. Use of the gap, del, damage, unclear, and supplied Elements in Combination for the relationship between these and other related elements used in detailed transcription.
-element del
-{
- attribute n { "1" | "2" }?,
- attribute rend { list { ( "overstrike" | "overtyped" | "overwritten" )+ } }?,
- tei_macro.paraContent
-}⚓
Appendix A.1.32 <delSpan>
<delSpan> (deleted span of text) marks the beginning of a longer sequence of text deleted, marked as deleted, or otherwise signaled as superfluous or spurious by an author, scribe, annotator, or corrector. [11.3.1.4. Additions and Deletions]
Both the beginning and ending of the deleted sequence must be marked: the beginning by the <delSpan> element, the ending by the target of the spanTo attribute.
The <delSpan> tag should not be used for deletions made by editors or encoders. In these cases, either the <corr> tag or the <gap> tag should be used.
Example
<p>Paragraph partially deleted. This is the undeleted
- portion <delSpan spanTo="#a23"/>and this the deleted
- portion of the paragraph.</p>
-<p>Paragraph deleted together with adjacent material.</p>
-<p>Second fully deleted paragraph.</p>
-<p>Paragraph partially deleted; in the middle of this
- paragraph the deletion ends and the anchor point marks
- the resumption <anchor xml:id="a23"/> of the text. ...</p>
Schematron
-<sch:assert test="@spanTo">The @spanTo attribute of <sch:name/> is required.</sch:assert>
Schematron
-<sch:assert test="@spanTo">L'attribut spanTo est requis.</sch:assert>
<body>
- <div type="part">
- <head>Fallacies of Authority</head>
- <p>The subject of which is Authority in various shapes, and the object, to repress all
- exercise of the reasoning faculty.</p>
- <div n="1" type="chapter">
- <head>The Nature of Authority</head>
- <p>With reference to any proposed measures having for their object the greatest
- happiness of the greatest number [...]</p>
- <div n="1.1" type="section">
- <head>Analysis of Authority</head>
- <p>What on any given occasion is the legitimate weight or influence to be attached to
- authority [...] </p>
- </div>
- <div n="1.2" type="section">
- <head>Appeal to Authority, in What Cases Fallacious.</head>
- <p>Reference to authority is open to the charge of fallacy when [...] </p>
- </div>
- </div>
- </div>
-</body>
Schematron
-<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not(ancestor::tei:floatingText)"> Abstract model violation: Lines may not contain higher-level structural elements such as div, unless div is a descendant of floatingText.
-</sch:report>
Schematron
-<sch:report test="(ancestor::tei:p or ancestor::tei:ab) and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div, unless div is a descendant of floatingText.
-</sch:report>
<docAuthor> (document author) contains the name of the author of the document, as given on the title page (often but not always contained in a byline). [4.6. Title Pages]
The document author's name often occurs within a byline, but the <docAuthor> element may be used whether the <byline> element is used or not. It should be used only for the author(s) of the entire document, not for author(s) of any subset or part of it. (Attributions of authorship of a subset or part of the document, for example of a chapter in a textbook or an article in a newspaper, may be encoded with <byline> without <docAuthor>.)
Example
<titlePage>
- <docTitle>
- <titlePart>Travels into Several Remote Nations of the World, in Four
- Parts.</titlePart>
- </docTitle>
- <byline> By <docAuthor>Lemuel Gulliver</docAuthor>, First a Surgeon,
- and then a Captain of several Ships</byline>
-</titlePage>
<docImprint> (document imprint) contains the imprint statement (place and date of publication, publisher name), as given (usually) at the foot of a title page. [4.6. Title Pages]
Cf. the <imprint> element of bibliographic citations. As with title, author, and editions, the shorter name is reserved for the element likely to be used more often.
<docImprint>
- <pubPlace>London</pubPlace>
- Printed for <name>E. Nutt</name>,
- at
-<pubPlace>Royal Exchange</pubPlace>;
-<name>J. Roberts</name> in
-<pubPlace>wick-Lane</pubPlace>;
-<name>A. Dodd</name> without
-<pubPlace>Temple-Bar</pubPlace>;
- and <name>J. Graves</name> in
-<pubPlace>St. James's-street.</pubPlace>
- <date>1722.</date>
-</docImprint>
<ellipsis> (deliberately marked omission) indicates a purposeful marking in the source document signalling that content has been omitted, and may also supply or describe the omitted content. [3.5.3. Additions, Deletions, and Omissions]
Unlike <gap>, which indicates content that the encoder cannot or chooses not to represent, <ellipsis> indicates a passage explicitly signalled in the source document as absent. The <ellipsis> element is not appropriate for every use of ellipsis points, such as when they indicate that a speaker is pausing.
Example
<lg>
- <l>What projects men make—what queer turns they take,</l>
- <l>Since <emph>steam</emph> has improved our condition;</l>
- <l>They never are still, but must cure or must kill</l>
- <l>With steam physic or steam ammunition.</l>
- <l>But a short time ago, to a quack you would go,</l>
- <l>To steam a fat man to a thinner;</l>
- <l>Now changed from all that, if you wish to get <emph>fat</emph>,</l>
- <l>Come to Barton’s and eat a <emph>steam dinner!</emph>
- </l>
- <l>Oh dear! think of a scheme, odd though it seem—</l>
- <l>I’m sure ’twill succeed if you make it by steam.</l>
-</lg>
-<lg>
- <l>You may sleep, you may dream, you may travel by steam,</l>
- <l>For the outcry is still to go faster;</l>
- <l>And what does it reck, should you e’en break your neck,</l>
- <l>If ’tis <emph>steam</emph> that brings on the disaster?</l>
- <ellipsis resp="#ChambersEdnbrghJrnl1880">
- <metamark function="multilineEllipsis"> * * * * </metamark>
- <desc resp="#teiProjectEditor2021">The printer omits four lines here,
- skipping the second half of the second octave, before the refrain.</desc>
- </ellipsis>
- <l>Oh dear! think of a scheme, odd though it seem—</l>
- <l>I’m sure ’twill succeed if you make it by steam.</l>
-</lg>
Example
<lg>
- <l>You think you’ve lost your love </l>
- <l>Well, I saw her yesterday </l>
- <l>It’s you she's thinking of </l>
- <l>And she told me what to say</l>
-</lg>
-<lg xml:id="chorus">
- <label>[Refrain]</label>
- <l>She says she loves you </l>
- <l>And you know that can’t be bad </l>
- <l>Yes, she loves you </l>
- <l>And you know you should be glad</l>
-</lg>
-<lg>
- <l>She said you hurt her so </l>
- <l>She almost lost her mind </l>
- <l>But now she said she knows </l>
- <l>You’re not the hurting kind</l>
-</lg>
-<ellipsis>
- <metamark>******</metamark>
- <supplied copyOf="#chorus"/>
-</ellipsis>
<encodingDesc>
- <p>Basic encoding, capturing lexical information only. All
- hyphenation, punctuation, and variant spellings normalized. No
- formatting or layout information preserved.</p>
-</encodingDesc>
<ex> (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation. [11.3.1.2. Abbreviation and Expansion]
The content of this element should be the expanded abbreviation, usually (but not always) a complete word or phrase. The <ex> element provided by the transcr module may be used to mark up sequences of letters supplied within such an expansion.
If abbreviations are expanded silently, this practice should be documented in the <editorialDecl>, either with a <normalization> element or a <p>.
Example
The address is Southmoor
-<choice>
- <expan>Road</expan>
- <abbr>Rd</abbr>
-</choice>
<facsimile> contains a representation of some written source in the form of a set of images rather than as transcribed or encoded text. [11.1. Digital Facsimiles]
-<sch:rule context="tei:facsimile//tei:line | tei:facsimile//tei:zone">
-<sch:report test="child::text()[ normalize-space(.) ne '']"> A facsimile element represents a text with images, thus
- transcribed text should not be present within it.
-</sch:report>
-</sch:rule>
The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.
Example
<fileDesc>
- <titleStmt>
- <title>The shortest possible TEI document</title>
- </titleStmt>
- <publicationStmt>
- <p>Distributed as part of TEI P5</p>
- </publicationStmt>
- <sourceDesc>
- <p>No print source exists: this is an original digital text</p>
- </sourceDesc>
-</fileDesc>
<floatingText> (floating text) contains a single text of any kind, whether unitary or composite, which interrupts the text containing it at any point and after which the surrounding text resumes. [4.3.2. Floating Texts]
A floating text has the same content as any other <text> and may thus be interrupted by another floating text, or contain a <group> of tesselated texts.
Example
<body>
- <div type="scene">
- <sp>
- <p>Hush, the players begin...</p>
- </sp>
- <floatingText type="pwp">
- <body>
- <div type="act">
- <sp>
- <l>In Athens our tale takes place [...]</l>
- </sp>
-<!-- ... rest of nested act here -->
- </div>
- </body>
- </floatingText>
- <sp>
- <p>Now that the play is finished ...</p>
- </sp>
- </div>
-</body>
<front> (front matter) contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. [4.6. Title Pages4. Default Text Structure]
Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the <front> and <back> elements are identical.
Example
<front>
- <epigraph>
- <quote>Nam Sibyllam quidem Cumis ego ipse oculis meis vidi in ampulla
- pendere, et cum illi pueri dicerent: <q xml:lang="grc">Σίβυλλα τί
- θέλεις</q>; respondebat illa: <q xml:lang="grc">ὰποθανεῖν θέλω.</q>
- </quote>
- </epigraph>
- <div type="dedication">
- <p>For Ezra Pound <q xml:lang="it">il miglior fabbro.</q>
- </p>
- </div>
-</front>
Example
<front>
- <div type="dedication">
- <p>To our three selves</p>
- </div>
- <div type="preface">
- <head>Author's Note</head>
- <p>All the characters in this book are purely imaginary, and if the
- author has used names that may suggest a reference to living persons
- she has done so inadvertently. ...</p>
- </div>
-</front>
Example
<front>
- <div type="abstract">
- <div>
- <head> BACKGROUND:</head>
- <p>Food insecurity can put children at greater risk of obesity because
- of altered food choices and nonuniform consumption patterns.</p>
- </div>
- <div>
- <head> OBJECTIVE:</head>
- <p>We examined the association between obesity and both child-level
- food insecurity and personal food insecurity in US children.</p>
- </div>
- <div>
- <head> DESIGN:</head>
- <p>Data from 9,701 participants in the National Health and Nutrition
- Examination Survey, 2001-2010, aged 2 to 11 years were analyzed.
- Child-level food insecurity was assessed with the US Department of
- Agriculture's Food Security Survey Module based on eight
- child-specific questions. Personal food insecurity was assessed with
- five additional questions. Obesity was defined, using physical
- measurements, as body mass index (calculated as kg/m2) greater than
- or equal to the age- and sex-specific 95th percentile of the Centers
- for Disease Control and Prevention growth charts. Logistic
- regressions adjusted for sex, race/ethnic group, poverty level, and
- survey year were conducted to describe associations between obesity
- and food insecurity.</p>
- </div>
- <div>
- <head> RESULTS:</head>
- <p>Obesity was significantly associated with personal food insecurity
- for children aged 6 to 11 years (odds ratio=1.81; 95% CI 1.33 to
- 2.48), but not in children aged 2 to 5 years (odds ratio=0.88; 95%
- CI 0.51 to 1.51). Child-level food insecurity was not associated
- with obesity among 2- to 5-year-olds or 6- to 11-year-olds.</p>
- </div>
- <div>
- <head> CONCLUSIONS:</head>
- <p>Personal food insecurity is associated with an increased risk of
- obesity only in children aged 6 to 11 years. Personal
- food-insecurity measures may give different results than aggregate
- food-insecurity measures in children.</p>
- </div>
- </div>
-</front>
<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions]
The <gap> tag simply signals the editors decision to omit or inability to transcribe a span of text. Other information, such as the interpretation that text was deliberately erased or covered, should be indicated using the relevant tags, such as <del> in the case of deliberate deletion.
The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute.
Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded.
Example
<figure>
- <graphic url="fig1.png"/>
- <head>Figure One: The View from the Bridge</head>
- <figDesc>A Whistleresque view showing four or five sailing boats in the foreground, and a
- series of buoys strung out between them.</figDesc>
-</figure>
<handNote scope="sole">
- <p>Written in insular
- phase II half-uncial with interlinear Old English gloss in an Anglo-Saxon pointed
- minuscule.</p>
-</handNote>
<handShift> (handwriting shift) marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint. [11.3.2.1. Document Hands]
Module
transcr
Attributes
new
indicates a <handNote> element describing the hand concerned.
This attribute serves the same function as the hand attribute provided for those elements which are members of the att.transcriptional class. It may be renamed at a subsequent major release.
The <handShift> element may be used either to denote a shift in the document hand (as from one scribe to another, on one writing style to another). Or, it may indicate a shift within a document hand, as a change of writing style, character or ink. Like other milestone elements, it should appear at the point of transition from some other state to the state which it describes.
Example
<l>When wolde the cat dwelle in his ynne</l>
-<handShift medium="greenish-ink"/>
-<l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and gaye</l>
-element handShift { attribute new { text }?, empty }⚓
Appendix A.1.51 <head>
<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers]
The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.
Example
The most common use for the <head> element is to mark the headings of sections. In older writings, the headings or incipits may be rather longer than usual in modern works. If a section has an explicit ending as well as a heading, it should be marked as a <trailer>, as in this example:
<div1 n="I" type="book">
- <head>In the name of Christ here begins the first book of the ecclesiastical history of
- Georgius Florentinus, known as Gregory, Bishop of Tours.</head>
- <div2 type="section">
- <head>In the name of Christ here begins Book I of the history.</head>
- <p>Proposing as I do ...</p>
- <p>From the Passion of our Lord until the death of Saint Martin four hundred and twelve
- years passed.</p>
- <trailer>Here ends the first Book, which covers five thousand, five hundred and ninety-six
- years from the beginning of the world down to the death of Saint Martin.</trailer>
- </div2>
-</div1>
Example
When headings are not inline with the running text (see e.g. the heading "Secunda conclusio") they might however be encoded as if. The actual placement in the source document can be captured with the place attribute.
<hi rend="gothic">And this Indenture further witnesseth</hi>
- that the said <hi rend="italic">Walter Shandy</hi>, merchant,
- in consideration of the said intended marriage ...
<history>
- <origin>
- <p>Written in Durham during the mid twelfth
- century.</p>
- </origin>
- <provenance>
- <p>Recorded in two medieval
- catalogues of the books belonging to Durham Priory, made in 1391 and
- 1405.</p>
- </provenance>
- <provenance>
- <p>Given to W. Olleyf by William Ebchester, Prior (1446-56)
- and later belonged to Henry Dalton, Prior of Holy Island (Lindisfarne)
- according to inscriptions on ff. 4v and 5.</p>
- </provenance>
- <acquisition>
- <p>Presented to Trinity College in 1738 by
- Thomas Gale and his son Roger.</p>
- </acquisition>
-</history>
<idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI.
In the last case, the identifier includes a non-Unicode character which is defined elsewhere by means of a <glyph> or <char> element referenced here as #sym.
<institution> (institution) contains the name of an organization such as a university or library, with which a manuscript or other object is identified, generally its holding institution. [10.4. The Manuscript Identifier]
Whatever string of characters is used to label a list item in the copy text may be used as the value of the global n attribute, but it is not required that numbering be recorded explicitly. In ordered lists, the n attribute on the <item> element is by definition synonymous with the use of the <label> element to record the enumerator of the list item. In glossary lists, however, the term being defined should be given with the <label> element, not n.
Example
<list rend="numbered">
- <head>Here begin the chapter headings of Book IV</head>
- <item n="4.1">The death of Queen Clotild.</item>
- <item n="4.2">How King Lothar wanted to appropriate one third of the Church revenues.</item>
- <item n="4.3">The wives and children of Lothar.</item>
- <item n="4.4">The Counts of the Bretons.</item>
- <item n="4.5">Saint Gall the Bishop.</item>
- <item n="4.6">The priest Cato.</item>
- <item> ...</item>
-</list>
<l met="x/x/x/x/x/" real="/xx/x/x/x/">Shall I compare thee to a summer's day?</l>
Schematron
-<sch:report test="ancestor::tei:l[not(.//tei:note//tei:l[. = current()])]"> Abstract model violation: Lines may not contain lines or lg elements.
-</sch:report>
Labels are commonly used for the headwords in glossary lists; note the use of the global xml:lang attribute to set the default language of the glossary list to Middle English, and identify the glosses and headings as modern English or Latin:
Labels may also be used to record explicitly the numbers or letters which mark list items in ordered lists, as in this extract from Gibbon's Autobiography. In this usage the <label> element is synonymous with the n attribute on the <item> element:
I will add two facts, which have seldom occurred
- in the composition of six, or at least of five quartos. <list rend="runon" type="ordered">
- <label>(1)</label>
- <item>My first rough manuscript, without any intermediate copy, has been sent to the press.</item>
- <label>(2) </label>
- <item>Not a sheet has been seen by any human eyes, excepting those of the author and the
- printer: the faults and the merits are exclusively my own.</item>
-</list>
Example
Labels may also be used for other structured list items, as in this extract from the journal of Edward Gibbon:
<list type="gloss">
- <label>March 1757.</label>
- <item>I wrote some critical observations upon Plautus.</item>
- <label>March 8th.</label>
- <item>I wrote a long dissertation upon some lines of Virgil.</item>
- <label>June.</label>
- <item>I saw Mademoiselle Curchod — <quote xml:lang="la">Omnia vincit amor, et nos cedamus
- amori.</quote>
- </item>
- <label>August.</label>
- <item>I went to Crassy, and staid two days.</item>
-</list>
Note that the <label> might also appear within the <item> rather than as its sibling. Though syntactically valid, this usage is not recommended TEI practice.
Example
Labels may also be used to represent a label or heading attached to a paragraph or sequence of paragraphs not treated as a structural division, or to a group of verse lines. Note that, in this case, the <label> element appears within the <p> or <lg> element, rather than as a preceding sibling of it.
<p>[...]
-<lb/>& n’entrer en mauuais & mal-heu-
-<lb/>ré meſnage. Or des que le conſente-
-<lb/>ment des parties y eſt le mariage eſt
-<lb/> arreſté, quoy que de faict il ne ſoit
-<label place="margin">Puiſſance maritale
- entre les Romains.</label>
- <lb/> conſommé. Depuis la conſomma-
-<lb/>tion du mariage la femme eſt ſoubs
-<lb/> la puiſſance du mary, s’il n’eſt eſcla-
-<lb/>ue ou enfant de famille : car en ce
-<lb/> cas, la femme, qui a eſpouſé vn en-
-<lb/>fant de famille, eſt ſous la puiſſance
- [...]</p>
In this example the text of the label appears in the right hand margin of the original source, next to the paragraph it describes, but approximately in the middle of it. If so desired the type attribute may be used to distinguish different categories of label.
By convention, <lb> elements should appear at the point in the text where a new line starts. The n attribute, if used, indicates the number or other value associated with the text between this point and the next <lb> element, typically the sequence number of the line within the page, or other appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the <l> element is available) except in circumstances where structural units cannot otherwise be marked.
The type attribute may be used to characterize the line break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the line break is word-breaking, or to note the source from which it derives.
Example
This example shows typographical line breaks within metrical lines, where they occur at different places in different editions:
<l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l>
-<l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l>
-<l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l>
Example
This example encodes typographical line breaks as a means of preserving the visual appearance of a title page. The break attribute is used to show that the line break does not (as elsewhere) mark the start of a new word.
<titlePart>
- <lb/>With Additions, ne-<lb break="no"/>ver before Printed.
-</titlePart>
The term lemma is used in text criticism to describe the reading given in the main text, which may be used as a heading in the apparatus itself. This usage connects it to mathematics (where a lemma is a proven proposition used as a step in a proof, a "given") and natural-language processing (where a lemma is the dictionary headword associated with an inflected form in the running text).
contains verse lines or nested line groups only, possibly prefixed by a heading.
Example
<lg type="free">
- <l>Let me be my own fool</l>
- <l>of my own making, the sum of it</l>
-</lg>
-<lg type="free">
- <l>is equivocal.</l>
- <l>One says of the drunken farmer:</l>
-</lg>
-<lg type="free">
- <l>leave him lay off it. And this is</l>
- <l>the explanation.</l>
-</lg>
Schematron
-<sch:assert test="count(descendant::tei:lg|descendant::tei:l|descendant::tei:gap) >
- 0">An lg element
- must contain at least one child l, lg, or gap element.</sch:assert>
Schematron
-<sch:report test="ancestor::tei:l[not(.//tei:note//tei:lg[. = current()])]"> Abstract model violation: Lines may not contain line groups.
-</sch:report>
A <licence> element should be supplied for each licence agreement applicable to the text in question. The target attribute may be used to reference a full version of the licence. The when, notBefore, notAfter, from or to attributes may be used in combination to indicate the date or dates of applicability of the licence.
Previous versions of these Guidelines recommended the use of type on <list> to encode the rendering or appearance of a list (whether it was bulleted, numbered, etc.). The current recommendation is to use the rend or style attributes for these aspects of a list, while using type for the more appropriate task of characterizing the nature of the content of a list.
The formal syntax of the element declarations allows <label> tags to be omitted from lists tagged <list type="gloss">; this is however a semantic error.
May contain an optional heading followed by a series of items, or a series of label and item pairs, the latter being optionally preceded by one or two specialized headings.
Example
<list rend="numbered">
- <item>a butcher</item>
- <item>a baker</item>
- <item>a candlestick maker, with
- <list rend="bulleted">
- <item>rings on his fingers</item>
- <item>bells on his toes</item>
- </list>
- </item>
-</list>
Example
<list type="syllogism" rend="bulleted">
- <item>All Cretans are liars.</item>
- <item>Epimenides is a Cretan.</item>
- <item>ERGO Epimenides is a liar.</item>
-</list>
Example
<list type="litany" rend="simple">
- <item>God save us from drought.</item>
- <item>God save us from pestilence.</item>
- <item>God save us from wickedness in high places.</item>
- <item>Praise be to God.</item>
-</list>
Example
The following example treats the short numbered clauses of Anglo-Saxon legal codes as lists of items. The text is from an ordinance of King Athelstan (924–939):
<div1 type="section">
- <head>Athelstan's Ordinance</head>
- <list rend="numbered">
- <item n="1">Concerning thieves. First, that no thief is to be spared who is caught with
- the stolen goods, [if he is] over twelve years and [if the value of the goods is] over
- eightpence.
- <list rend="numbered">
- <item n="1.1">And if anyone does spare one, he is to pay for the thief with his
- wergild — and the thief is to be no nearer a settlement on that account — or to
- clear himself by an oath of that amount.</item>
- <item n="1.2">If, however, he [the thief] wishes to defend himself or to escape, he is
- not to be spared [whether younger or older than twelve].</item>
- <item n="1.3">If a thief is put into prison, he is to be in prison 40 days, and he may
- then be redeemed with 120 shillings; and the kindred are to stand surety for him
- that he will desist for ever.</item>
- <item n="1.4">And if he steals after that, they are to pay for him with his wergild,
- or to bring him back there.</item>
- <item n="1.5">And if he steals after that, they are to pay for him with his wergild,
- whether to the king or to him to whom it rightly belongs; and everyone of those who
- supported him is to pay 120 shillings to the king as a fine.</item>
- </list>
- </item>
- <item n="2">Concerning lordless men. And we pronounced about these lordless men, from whom
- no justice can be obtained, that one should order their kindred to fetch back such a
- person to justice and to find him a lord in public meeting.
- <list rend="numbered">
- <item n="2.1">And if they then will not, or cannot, produce him on that appointed day,
- he is then to be a fugitive afterwards, and he who encounters him is to strike him
- down as a thief.</item>
- <item n="2.2">And he who harbours him after that, is to pay for him with his wergild
- or to clear himself by an oath of that amount.</item>
- </list>
- </item>
- <item n="3">Concerning the refusal of justice. The lord who refuses justice and upholds
- his guilty man, so that the king is appealed to, is to repay the value of the goods and
- 120 shillings to the king; and he who appeals to the king before he demands justice as
- often as he ought, is to pay the same fine as the other would have done, if he had
- refused him justice.
- <list rend="numbered">
- <item n="3.1">And the lord who is an accessory to a theft by his slave, and it becomes
- known about him, is to forfeit the slave and be liable to his wergild on the first
- occasionp if he does it more often, he is to be liable to pay all that he owns.</item>
- <item n="3.2">And likewise any of the king's treasurers or of our reeves, who has been
- an accessory of thieves who have committed theft, is to liable to the same.</item>
- </list>
- </item>
- <item n="4">Concerning treachery to a lord. And we have pronounced concerning treachery to
- a lord, that he [who is accused] is to forfeit his life if he cannot deny it or is
- afterwards convicted at the three-fold ordeal.</item>
- </list>
-</div1>
Note that nested lists have been used so the tagging mirrors the structure indicated by the two-level numbering of the clauses. The clauses could have been treated as a one-level list with irregular numbering, if desired.
Example
<p>These decrees, most blessed Pope Hadrian, we propounded in the public council ... and they
- confirmed them in our hand in your stead with the sign of the Holy Cross, and afterwards
- inscribed with a careful pen on the paper of this page, affixing thus the sign of the Holy
- Cross.
-<list rend="simple">
- <item>I, Eanbald, by the grace of God archbishop of the holy church of York, have
- subscribed to the pious and catholic validity of this document with the sign of the Holy
- Cross.</item>
- <item>I, Ælfwold, king of the people across the Humber, consenting have subscribed with
- the sign of the Holy Cross.</item>
- <item>I, Tilberht, prelate of the church of Hexham, rejoicing have subscribed with the
- sign of the Holy Cross.</item>
- <item>I, Higbald, bishop of the church of Lindisfarne, obeying have subscribed with the
- sign of the Holy Cross.</item>
- <item>I, Ethelbert, bishop of Candida Casa, suppliant, have subscribed with thef sign of
- the Holy Cross.</item>
- <item>I, Ealdwulf, bishop of the church of Mayo, have subscribed with devout will.</item>
- <item>I, Æthelwine, bishop, have subscribed through delegates.</item>
- <item>I, Sicga, patrician, have subscribed with serene mind with the sign of the Holy
- Cross.</item>
- </list>
-</p>
Schematron
-<sch:rule context="tei:list[@type='gloss']">
-<sch:assert test="tei:label">The content of a "gloss" list should include a sequence of one or more pairs of a label element followed by an item element</sch:assert>
-</sch:rule>
The enclosed annotations may use the general-purpose <note> element; or, for annotations pertaining to transcriptions of speech, the special-purpose <annotationBlock> element; or the <annotation> element, which is intended to map cleanly onto the Web Annotation Data Model.
Example
<standOff>
- <listAnnotation>
- <note target="#RotAM.4.15" place="margin"
- resp="#STC" type="gloss"> The spell begins to
- break </note>
- <note target="#RotAM.4.15" place="bottom"
- resp="#JLL"> The turning point of the poem...
- </note>
- </listAnnotation>
-</standOff>
-<!-- ... -->
-<lg xml:id="RotAM.4.15" rhyme="ABCB">
- <l>The self-same moment I could pray;</l>
- <l>And from my neck so free</l>
- <l>The albatross fell off, and sank</l>
- <l>Like lead into the sea.</l>
-</lg>
<listAnnotation>
- <annotation xml:id="bgann1"
- motivation="commenting"
- target="#match(bg-c1p1s1,'Gallia.*omnis')">
- <respStmt>
- <resp>creator</resp>
- <persName>Francis Kelsey</persName>
- </respStmt>
- <note>‘Gaul as a whole,’ contrasted with Gaul in the narrower sense, or Celtic Gaul; Celtic Gaul also is often called Gallia.</note>
- </annotation>
- <annotation xml:id="bgann2"
- motivation="commenting"
- target="#match(bg-c1p1s1,'Gallia.*divisa')">
- <respStmt>
- <resp>creator</resp>
- <persName>Rice Holmes</persName>
- </respStmt>
- <note>Gallia...divisa: Notice the order of the words. ‘Gaul, taken as a whole, is divided’.</note>
- </annotation>
- <annotation xml:id="bgann3"
- motivation="commenting" target="#match(bg-c1p1s1,'Belgae')">
- <respStmt>
- <resp>creator</resp>
- <persName>Arthur Tappan Walker</persName>
- </respStmt>
- <note>Belgae -arum m., the Belgae or Belgians</note>
- </annotation>
- <annotation xml:id="bgann4"
- motivation="commenting" target="#match(bg-c1p1s1,'Aquitani')">
- <respStmt>
- <resp>creator</resp>
- <persName>Arthur Tappan Walker</persName>
- </respStmt>
- <note>Aquitani, -orum m.: the Aquitani, inhabiting southwestern Gaul</note>
- </annotation>
- <annotation xml:id="bgann5"
- motivation="commenting" target="#match(bg-c1p1s1,'Celtae')">
- <respStmt>
- <resp>creator</resp>
- <persName>Arthur Tappan Walker</persName>
- </respStmt>
- <note>Celtae, -arum m: the Celtae or Celts</note>
- </annotation>
- <annotation xml:id="bgann6"
- motivation="commenting"
- target="#match(bg-c1p1s2,'Gallos(.|\n)*dividit')">
- <respStmt>
- <resp>creator</resp>
- <persName>William Francis Allen</persName>
- <persName>Joseph Henry Allen</persName>
- <persName>Harry Pratt Judson</persName>
- </respStmt>
- <note>the verb is singular, because the two rivers make one boundary; as we should say,
- ‘is divided by the line of the Seine and Marne.’</note>
- </annotation>
-</listAnnotation>
-<!-- Elsewhere in the document -->
-<text>
- <body>
- <div type="edition">
- <div type="textpart" subtype="chapter"
- n="1" xml:id="bg-c1">
- <p n="1" xml:id="bg-c1p1">
- <seg n="1" xml:id="bg-c1p1s1">Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam
- Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur.</seg>
- <seg n="2" xml:id="bg-c1p1s2">Hi omnes lingua, institutis, legibus inter se differunt. Gallos ab Aquitanis
- Garumna flumen, a Belgis Matrona et Sequana dividit.</seg>
-<!-- ... -->
- </p>
- </div>
- </div>
- </body>
-</text>
<metamark> contains or describes any kind of graphic or written signal within a document the function of which is to determine how it should be read rather than forming part of the actual content of the document. [11.3.4.2. Metamarks]
<surface>
- <metamark function="used" rend="line"
- target="#X2"/>
- <zone xml:id="zone-X2">
- <line>I am that halfgrown <add>angry</add> boy, fallen asleep</line>
- <line>The tears of foolish passion yet undried</line>
- <line>upon my cheeks.</line>
-<!-- ... -->
- <line>I pass through <add>the</add> travels and <del>fortunes</del> of
- <retrace>thirty</retrace>
- </line>
- <line>years and become old,</line>
- <line>Each in its due order comes and goes,</line>
- <line>And thus a message for me comes.</line>
- <line>The</line>
- </zone>
- <metamark function="used"
- target="#zone-X2">Entered - Yes</metamark>
-</surface>
<milestone> (milestone) marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element. [3.11.3. Milestone Elements]
Module
core
Attributes
rend
(rendition) indicates how the element in question was rendered or presented in the source text.
For this element, the global n attribute indicates the new number or other value for the unit which changes at this milestone. The special value unnumbered should be used in passages which fall outside the normal numbering scheme, such as chapter or other headings, poem numbers or titles, etc.
The order in which <milestone> elements are given at a given point is not normally significant.
-element mod
-{
- tei_att.global.attribute.xmlid,
- attribute n { "1" | "2" }?,
- attribute rend { list { ( "circled" | "framed" )+ } }?,
- tei_macro.paraContent
-}⚓
Appendix A.1.69 <monogr>
<monogr> (monographic level) contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object). [3.12.2.1. Analytic, Monographic, and Series Levels]
May contain specialized bibliographic elements, in a prescribed order.
The <monogr> element may only occur only within a <biblStruct>, where its use is mandatory for the description of a monographic-level bibliographic item.
Example
<biblStruct>
- <analytic>
- <author>Chesnutt, David</author>
- <title>Historical Editions in the States</title>
- </analytic>
- <monogr>
- <title level="j">Computers and the Humanities</title>
- <imprint>
- <date when="1991-12">(December, 1991):</date>
- </imprint>
- <biblScope>25.6</biblScope>
- <biblScope unit="page" from="377" to="380">377–380</biblScope>
- </monogr>
-</biblStruct>
<msContents> (manuscript contents) describes the intellectual content of a manuscript, manuscript part, or other object either as a series of paragraphs or as a series of structured manuscript items. [10.6. Intellectual Content]
Unless it contains a simple prose description, this element should contain at least one of the elements <summary>, <msItem>, or <msItemStruct>. This constraint is not currently enforced by the schema.
Example
<msContents class="#sermons">
- <p>A collection of Lollard sermons</p>
-</msContents>
<msDesc> (manuscript description) contains a description of a single identifiable manuscript or other text-bearing object such as an early printed book. [10.1. Overview]
Although the <msDesc> has primarily been designed with a view to encoding manuscript descriptions, it may also be used for other objects such as early printed books, fascicles, epigraphs, or any text-bearing objects that require substantial description. If an object is not text-bearing or the reasons for describing the object is not primarily the textual content, the more general <object> may be more suitable.
Example
<msDesc>
- <msIdentifier>
- <settlement>Oxford</settlement>
- <repository>Bodleian Library</repository>
- <idno type="Bod">MS Poet. Rawl. D. 169.</idno>
- </msIdentifier>
- <msContents>
- <msItem>
- <author>Geoffrey Chaucer</author>
- <title>The Canterbury Tales</title>
- </msItem>
- </msContents>
- <physDesc>
- <objectDesc>
- <p>A parchment codex of 136 folios, measuring approx
- 28 by 19 inches, and containing 24 quires.</p>
- <p>The pages are margined and ruled throughout.</p>
- <p>Four hands have been identified in the manuscript: the first 44
- folios being written in two cursive anglicana scripts, while the
- remainder is for the most part in a mixed secretary hand.</p>
- </objectDesc>
- </physDesc>
-</msDesc>
<msIdentifier> (manuscript identifier) contains the information required to identify the manuscript or similar object being described. [10.4. The Manuscript Identifier]
-<sch:report test="not(parent::tei:msPart) and (local-name(*[1])='idno' or local-name(*[1])='altIdentifier'
- or normalize-space(.)='')">An msIdentifier must contain either a repository or location.</sch:report>
<msItemStruct> (structured manuscript item) contains a structured description for an individual work or item within the intellectual content of a manuscript, manuscript part, or other object. [10.6.1. The msItem and msItemStruct Elements]
Proper nouns referring to people, places, and organizations may be tagged instead with <persName>, <placeName>, or <orgName>, when the TEI module for names and dates is included.
In the following example, the translator has supplied a footnote containing an explanation of the term translated as "painterly":
And yet it is not only
- in the great line of Italian renaissance art, but even in the
- painterly <note place="bottom" type="gloss"
- resp="#MDMH">
- <term xml:lang="de">Malerisch</term>. This word has, in the German, two
- distinct meanings, one objective, a quality residing in the object,
- the other subjective, a mode of apprehension and creation. To avoid
- confusion, they have been distinguished in English as
-<mentioned>picturesque</mentioned> and
-<mentioned>painterly</mentioned> respectively.
-</note> style of the
- Dutch genre painters of the seventeenth century that drapery has this
- psychological significance.
-
-<!-- elsewhere in the document -->
-<respStmt xml:id="MDMH">
- <resp>translation from German to English</resp>
- <name>Hottinger, Marie Donald Mackie</name>
-</respStmt>
For this example to be valid, the code MDMH must be defined elsewhere, for example by means of a responsibility statement in the associated TEI header.
Example
The global n attribute may be used to supply the symbol or number used to mark the note's point of attachment in the source text, as in the following example:
Mevorakh b. Saadya's mother, the matriarch of the
- family during the second half of the eleventh century, <note n="126" anchored="true"> The
- alleged mention of Judah Nagid's mother in a letter from 1071 is, in fact, a reference to
- Judah's children; cf. above, nn. 111 and 54. </note> is well known from Geniza documents
- published by Jacob Mann.
However, if notes are numbered in sequence and their numbering can be reconstructed automatically by processing software, it may well be considered unnecessary to record the note numbers.
In the following example, there are two notes in different languages, each specifying the content of the annotation relating to the same fragment of text:
<p>(...) tamen reuerendos dominos archiepiscopum et canonicos Leopolienses
- necnon episcopum in duplicibus Quatuortemporibus
-<noteGrp>
- <note xml:lang="en">Quatuor Tempora, so called dry fast days (Wednesday, Friday, and Saturday)
- falling on each of the quarters of the year. In the first quarter they were called Cinerum
- (following Ash Wednesday), second Spiritus (following Pentecost), third Crucis
- (after the Exaltation of the Holy Cross, September 14th), and Luciae
- in the fourth (after the feast of St. Lucia, December 13th).
- </note>
- <note xml:lang="pl">Quatuor Tempora, tzw. Suche dni postne (środa, piątek i sobota)
- przypadające cztery razy w roku. W pierwszym kwartale zwały się Cinerum
- (po Popielcu), w drugim Spiritus (po Zielonych Świętach), w trzecim Crucis
- (po święcie Podwyższenia Krzyża 14 września), w czwartym Luciae
- (po dniu św. Łucji 13 grudnia).
- </note>
- </noteGrp>
- totaliter expediui.
-</p>
<notesStmt> (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description. [2.2.6. The Notes Statement2.2. The File Description]
<opener> (opener) groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. [4.2. Elements Common to All Divisions]
<opener>
- <dateline>Walden, this 29. of August 1592</dateline>
-</opener>
Example
<opener>
- <dateline>
- <name type="place">Great Marlborough Street</name>
- <date>November 11, 1848</date>
- </dateline>
- <salute>My dear Sir,</salute>
-</opener>
-<p>I am sorry to say that absence from town and other circumstances have prevented me from
- earlier enquiring...</p>
<origDate> (origin date) contains any form of date, used to identify the date of origin for a manuscript, manuscript part, or other object. [10.3.1. Origination]
<origin> (origin) contains any descriptive or other information concerning the origin of a manuscript, manuscript part, or other object. [10.8. History]
<origin notBefore="1802" notAfter="1845"
- evidence="internal" resp="#AMH">Copied in <name type="origPlace">Derby</name>, probably from an
- old Flemish original, between 1802 and 1845, according to <persName xml:id="AMH">Anne-Mette Hansen</persName>.
-</origin>
<p>Hallgerd was outside. <q>There is blood on your axe,</q> she said. <q>What have you
- done?</q>
-</p>
-<p>
- <q>I have now arranged that you can be married a second time,</q> replied Thjostolf.
-</p>
-<p>
- <q>Then you must mean that Thorvald is dead,</q> she said.
-</p>
-<p>
- <q>Yes,</q> said Thjostolf. <q>And now you must think up some plan for me.</q>
-</p>
Schematron
-<sch:report test="(ancestor::tei:ab or ancestor::tei:p) and not( ancestor::tei:floatingText
- |parent::tei:exemplum |parent::tei:item |parent::tei:note |parent::tei:q
- |parent::tei:quote |parent::tei:remarks |parent::tei:said |parent::tei:sp
- |parent::tei:stage |parent::tei:cell |parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements.
-</sch:report>
Schematron
-<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not( ancestor::tei:floatingText
- |parent::tei:figure |parent::tei:note )"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab, unless p is a child of figure or note, or is a descendant of floatingText.
-</sch:report>
identifies a line within the container or bounding box specified by the parent element by means of a series of two or more pairs of numbers, each of which gives the x,y coordinates of a point on the line.
Derived from
att.coordinated
Status
Optional
Datatype
2–∞ occurrences ofteidata.pointseparated by whitespace
Contained by
—
May contain
Empty element
Note
Although the simplest form of a path is a straight line between two points, a line with more than two points may bend at any point. The order of coordinates in points is significant, because the line follows the coordinate sequence.
To specify a closed polygon, use the <zone> element rather than the <path> element.
-<sch:rule context="tei:path[@points]">
-<sch:let name="firstPair"
- value="tokenize( normalize-space( @points ), ' ')[1]"/>
-<sch:let name="lastPair"
- value="tokenize( normalize-space( @points ), ' ')[last()]"/>
-<sch:let name="firstX"
- value="xs:float( substring-before( $firstPair, ',') )"/>
-<sch:let name="firstY"
- value="xs:float( substring-after( $firstPair, ',') )"/>
-<sch:let name="lastX"
- value="xs:float( substring-before( $lastPair, ',') )"/>
-<sch:let name="lastY"
- value="xs:float( substring-after( $lastPair, ',') )"/>
-<sch:report test="$firstX eq $lastX and $firstY eq $lastY">The first and
- last elements of this path are the same. To specify a closed polygon, use
- the zone element rather than the path element. </sch:report>
-</sch:rule>
A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself.
The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives.
Example
Page numbers may vary in different editions of a text.
<p> ... <pb n="145" ed="ed2"/>
-<!-- Page 145 in edition "ed2" starts here --> ... <pb n="283" ed="ed1"/>
-<!-- Page 283 in edition "ed1" starts here--> ... </p>
Example
A page break may be associated with a facsimile image of the page it introduces by means of the facs attribute
<body>
- <pb n="1" facs="page1.png"/>
-<!-- page1.png contains an image of the page;
- the text it contains is encoded here -->
- <p>
-<!-- ... -->
- </p>
- <pb n="2" facs="page2.png"/>
-<!-- similarly, for page 2 -->
- <p>
-<!-- ... -->
- </p>
-</body>
Example encoding of the German sentence Wir fahren in den Urlaub., encoded with attributes from att.linguistic discussed in section [[undefined AILALW]].
-element pc
-{
- tei_att.global.source.attribute.source,
- tei_att.datcat.attribute.targetDatcat,
- tei_att.linguistic.attributes,
- ( text | tei_model.gLike | c | tei_model.pPart.edit )*
-}⚓
Appendix A.1.85 <profileDesc>
<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description2.1.1. The TEI Header and Its Components]
Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts.
Where a publication statement contains several members of the model.publicationStmtPart.agency or model.publicationStmtPart.detail classes rather than one or more paragraphs or anonymous blocks, care should be taken to ensure that the repeated elements are presented in a meaningful order. It is a conformance requirement that elements supplying information about publication place, address, identifier, availability, and date be given following the name of the publisher, distributor, or authority concerned, and preferably in that order.
<publicationStmt>
- <publisher>Zea Books</publisher>
- <pubPlace>Lincoln, NE</pubPlace>
- <date>2017</date>
- <availability>
- <p>This is an open access work licensed under a Creative Commons Attribution 4.0 International license.</p>
- </availability>
- <ptr target="http://digitalcommons.unl.edu/zeabook/55"/>
-</publicationStmt>
<quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text. [3.3.3. Quotation4.3.1. Grouped Texts]
If a bibliographic citation is supplied for the source of a quotation, the two may be grouped using the <cit> element.
Example
Lexicography has shown little sign of being affected by the
- work of followers of J.R. Firth, probably best summarized in his
- slogan, <quote>You shall know a word by the company it
- keeps</quote>
-<ref>(Firth, 1957)</ref>
The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage.
<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors2.2.1. The Title Statement2.2.2. The Edition Statement2.2.5. The Series Statement]
may be used to specify further information about the entity referenced by this name in the form of a set of whitespace-separated values, for example the occupation of a person, or the status of a place.
-<sch:rule context="@key">
-<sch:let name="index"
- value="doc('../lists.xml')"/>
-<sch:assert test=". = $index//@xml:id">ID is not
- available in Project Knowledge file</sch:assert>
-</sch:rule>
<q>My dear <rs type="person">Mr. Bennet</rs>, </q> said <rs type="person">his lady</rs>
- to him one day,
-<q>have you heard that <rs type="place">Netherfield Park</rs> is let at
- last?</q>
Where the place attribute is not provided on the <rt> element, the default assumption is that the ruby gloss is above where the text is horizontal, and to the right of the text where it is vertical.
Example
The word 大統領daitōryō (president) is glossed character by character in hiragana to provide a pronunciation guide.
<said> (speech or thought) indicates passages thought or spoken aloud, whether explicitly indicated in the source or not, whether directly or indirectly reported, whether by real people or fictional characters. [3.3.3. Quotation]
-<!-- in the header --><editorialDecl>
- <quotation marks="all"/>
-</editorialDecl>
-<!-- ... -->
-<p>
- <said>"Our minstrel here will warm the old man's heart with song, dazzle him with jewels and
- gold"</said>, a troublemaker simpered. <said>"He'll trample on the Duke's camellias, spill
- his wine, and blunt his sword, and say his name begins with X, and in the end the Duke
- will say, <said>'Take Saralinda, with my blessing, O lordly Prince of Rags and Tags, O
- rider of the sun!'</said>"</said>
-</p>
Example
<p>
- <said aloud="true" rend="pre(“) post(”)">Hmmm</said>, said a small voice in his ear.
-<said aloud="true" rend="pre(“) post(”)">Difficult. Very difficult. Plenty of courage, I see.
- Not a bad mind either. there's talent, oh my goodness, yes — and a nice thirst to prove
- yourself, now that's interesting. … So where shall I put you?</said>
-</p>
-<p>Harry gripped the edges of the stool and thought, <said aloud="false" rend="italic">Not
- Slytherin, not Slytherin</said>.</p>
-element said { tei_att.ascribed.attribute.who, tei_macro.specialPara }⚓
Appendix A.1.100 <salute>
<salute> (salutation) contains a salutation or greeting prefixed to a foreword, dedicatory epistle, or other division of a text, or the salutation in the closing of a letter, preface, etc. [4.2.2. Openers and Closers]
Module
textstructure
Attributes
rend
(rendition) indicates how the element in question was rendered or presented in the source text.
The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a <ptr> or other similar element.
Example
<seg>When are you leaving?</seg>
-<seg>Tomorrow.</seg>
Example
<s>
- <seg rend="caps" type="initial-cap">So father's only</seg> glory was the ballfield.
-</s>
Example
<seg type="preamble">
- <seg>Sigmund, <seg type="patronym">the son of Volsung</seg>, was a king in Frankish country.</seg>
- <seg>Sinfiotli was the eldest of his sons ...</seg>
- <seg>Borghild, Sigmund's wife, had a brother ... </seg>
-</seg>
<set> (setting) contains a description of the setting, time, locale, appearance, etc., of the action of a play, typically found in the front matter of a printed performance text (not a stage direction). [7.1. Front and Back Matter ]
Module
drama
Attributes
rend
(rendition) indicates how the element in question was rendered or presented in the source text.
This element should not be used outside the front or back matter; for similar contextual descriptions within the body of the text, use the <stage> element.
Example
<set>
- <p>The action takes place on February 7th between the hours of noon and six in the
- afternoon, close to the Trenartha Tin Plate Works, on the borders of England and Wales,
- where a strike has been in progress throughout the winter.</p>
-</set>
Example
<set>
- <head>SCENE</head>
- <p>A Sub-Post Office on a late autumn evening</p>
-</set>
Example
<front>
-<!-- <titlePage>, <div type="Dedication">, etc. -->
- <set>
- <list type="gloss">
- <label>TIME</label>
- <item>1907</item>
- <label>PLACE</label>
- <item>East Coast village in England</item>
- </list>
- </set>
-</front>
for his nose was as sharp as
- a pen, and <sic>a Table</sic> of green fields.
Example
If all that is desired is to call attention to the apparent problem in the copy text, <sic> may be used alone:
I don't know, Juan. It's so far in the past now
- — how <sic>we can</sic> prove or disprove anyone's theories?
Example
It is also possible, using the <choice> and <corr> elements, to provide a corrected reading:
I don't know, Juan. It's so far in the past now
- — how <choice>
- <sic>we can</sic>
- <corr>can we</corr>
-</choice> prove or disprove anyone's theories?
Example
for his nose was as sharp as
- a pen, and <choice>
- <sic>a Table</sic>
- <corr>a' babbld</corr>
-</choice> of green fields.
<signed> (signature) contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text. [4.2.2. Openers and Closers]
<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description]
<sourceDesc>
- <bibl>
- <title level="a">The Interesting story of the Children in the Wood</title>. In
- <author>Victor E Neuberg</author>, <title>The Penny Histories</title>.
- <publisher>OUP</publisher>
- <date>1968</date>. </bibl>
-</sourceDesc>
Example
<sourceDesc>
- <p>Born digital: no previous source exists.</p>
-</sourceDesc>
The who attribute on this element may be used either in addition to the <speaker> element or as an alternative.
Example
<sp>
- <speaker>The reverend Doctor Opimian</speaker>
- <p>I do not think I have named a single unpresentable fish.</p>
-</sp>
-<sp>
- <speaker>Mr Gryll</speaker>
- <p>Bream, Doctor: there is not much to be said for bream.</p>
-</sp>
-<sp>
- <speaker>The Reverend Doctor Opimian</speaker>
- <p>On the contrary, sir, I think there is much to be said for him. In the first place [...]</p>
- <p>Fish, Miss Gryll — I could discourse to you on fish by the hour: but for the present I
- will forbear [...]</p>
-</sp>
<speaker> contains a specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment. [3.13.2. Core Tags for Drama]
The who attribute may be used to indicate more precisely the person or persons participating in the action described by the stage direction.
Example
<stage type="setting">A curtain being drawn.</stage>
-<stage type="setting">Music</stage>
-<stage type="entrance">Enter Husband as being thrown off his horse and falls.</stage>
-<!-- Middleton : Yorkshire Tragedy -->
-<stage type="exit">Exit pursued by a bear.</stage>
-<stage type="business">He quickly takes the stone out.</stage>
-<stage type="delivery">To Lussurioso.</stage>
-<stage type="novelistic">Having had enough, and embarrassed for the family.</stage>
-<!-- Lorraine Hansbury : a raisin in in the sun -->
-<stage type="modifier">Disguised as Ansaldo.</stage>
-<stage type="entrance modifier">Enter Latrocinio disguised as an empiric</stage>
-<!-- Middleton: The Widow -->
-<stage type="location">At a window.</stage>
-<stage rend="inline" type="delivery">Aside.</stage>
Example
<l>Behold. <stage n="*" place="margin">Here the vp<lb/>per part of the <hi>Scene</hi> open'd; when
- straight appear'd a Heauen, and all the <hi>Pure Artes</hi> sitting on
- two semi<lb/>circular ben<lb/>ches, one a<lb/>boue another: who sate thus till the rest of the
- <hi>Prologue</hi> was spoken, which being ended, they descended in
- order within the <hi>Scene,</hi> whiles the Musicke plaid</stage> Our
- Poet knowing our free hearts</l>
<standOff> Functions as a container element for linked data, contextual information, and stand-off annotations embedded in a TEI document. [16.10. The standOff Container]
This example shows an encoding of contextual information which is referred to from the main text.
<TEI xmlns="http://www.tei-c.org/ns/1.0">
- <teiHeader>
-<!-- ... -->
- </teiHeader>
- <standOff>
- <listPlace>
- <place xml:id="LATL">
- <placeName>Atlanta</placeName>
- <location>
- <region key="US-GA">Georgia</region>
- <country key="USA">United States of America</country>
- <geo>33.755 -84.39</geo>
- </location>
- <population when="1963"
- type="interpolatedCensus" quantity="489359"
- source="https://www.biggestuscities.com/city/atlanta-georgia"/>
- </place>
- <place xml:id="LBHM">
- <placeName>Birmingham</placeName>
- <location>
- <region key="US-AL">Alabama</region>
- <country key="USA">United States of America</country>
- <geo>33.653333 -86.808889</geo>
- </location>
- <population when="1963"
- type="interpolatedCensus" quantity="332891"
- source="https://www.biggestuscities.com/city/birmingham-alabama"/>
- </place>
- </listPlace>
- </standOff>
- <text>
- <body>
-<!-- ... -->
- <p>Moreover, I am <choice>
- <sic>congnizant</sic>
- <corr>cognizant</corr>
- </choice> of the interrelatedness of all communities and
- <lb/>states. I cannot sit idly by in <placeName ref="#LATL">Atlanta</placeName> and not be concerned about what happens
- <lb/>in <placeName ref="#LBHM">Birmingham</placeName>. <seg xml:id="FQ17">Injustice anywhere is a threat to justice everywhere.</seg> We
- <lb/>are caught in an inescapable network of mutuality, tied in a single garment
- <lb/>of destiny. Whatever affects one directly affects all indirectly. Never
- <lb/>again can we afford to live with the narrow, provincial <soCalled rendition="#Rqms">outside agitator</soCalled>
- <lb/>idea. Anyone who lives inside the United States can never be considered
- <lb/>an outsider anywhere in this country.</p>
-<!-- ... -->
- </body>
- </text>
-</TEI>
Schematron
-<sch:assert test="@type or not(ancestor::tei:standOff)">This
-<sch:name/> element must have a @type attribute, since it is
- nested inside a <sch:name/>
-</sch:assert>
<subst> (substitution) groups one or more deletions (or surplus text) with one or more additions when the combination is to be regarded as a single intervention in the text. [11.3.1.5. Substitutions]
... are all included. <del hand="#RG">It is</del>
-<subst>
- <add>T</add>
- <del>t</del>
-</subst>he expressed
-
Example
that he and his Sister Miſs D — <lb/>who always lived with him, wd. be <subst>
- <del>very</del>
- <lb/>
- <add>principally</add>
-</subst> remembered in her Will.
-
-<sch:assert test="child::tei:add and (child::tei:del or child::tei:surplus)">
-<sch:name/> must have at least one child add and at least one child del or surplus</sch:assert>
<summary> contains an overview of the available information concerning some aspect of an item or object (for example, its intellectual content, history, layout, typography etc.) as a complement or alternative to the more detailed information carried by more specific elements. [10.6. Intellectual Content]
<summary>This item consists of three books with a prologue and an epilogue.
-</summary>
Example
<typeDesc>
- <summary>Uses a mixture of Roman and Black Letter types.</summary>
- <typeNote>Antiqua typeface, showing influence of Jenson's Venetian
- fonts.</typeNote>
- <typeNote>The black letter face is a variant of Schwabacher.</typeNote>
-</typeDesc>
<supplied> (supplied) signifies text supplied by the transcriber or editor for any reason; for example because the original cannot be read due to physical damage, or because of an obvious omission by the author or scribe. [11.3.3.1. Damage, Illegibility, and Supplied Text]
One of the few elements unconditionally required in any TEI document.
Example
<teiHeader>
- <fileDesc>
- <titleStmt>
- <title>Shakespeare: the first folio (1623) in electronic form</title>
- <author>Shakespeare, William (1564–1616)</author>
- <respStmt>
- <resp>Originally prepared by</resp>
- <name>Trevor Howard-Hill</name>
- </respStmt>
- <respStmt>
- <resp>Revised and edited by</resp>
- <name>Christine Avern-Carr</name>
- </respStmt>
- </titleStmt>
- <publicationStmt>
- <distributor>Oxford Text Archive</distributor>
- <address>
- <addrLine>13 Banbury Road, Oxford OX2 6NN, UK</addrLine>
- </address>
- <idno type="OTA">119</idno>
- <availability>
- <p>Freely available on a non-commercial basis.</p>
- </availability>
- <date when="1968">1968</date>
- </publicationStmt>
- <sourceDesc>
- <bibl>The first folio of Shakespeare, prepared by Charlton Hinman (The Norton Facsimile,
- 1968)</bibl>
- </sourceDesc>
- </fileDesc>
- <encodingDesc>
- <projectDesc>
- <p>Originally prepared for use in the production of a series of old-spelling
- concordances in 1968, this text was extensively checked and revised for use during the
- editing of the new Oxford Shakespeare (Wells and Taylor, 1989).</p>
- </projectDesc>
- <editorialDecl>
- <correction>
- <p>Turned letters are silently corrected.</p>
- </correction>
- <normalization>
- <p>Original spelling and typography is retained, except that long s and ligatured
- forms are not encoded.</p>
- </normalization>
- </editorialDecl>
- <refsDecl xml:id="ASLREF">
- <cRefPattern matchPattern="(\S+) ([^.]+)\.(.*)"
- replacementPattern="#xpath(//div1[@n='$1']/div2/[@n='$2']//lb[@n='$3'])">
- <p>A reference is created by assembling the following, in the reverse order as that
- listed here: <list>
- <item>the <att>n</att> value of the preceding <gi>lb</gi>
- </item>
- <item>a period</item>
- <item>the <att>n</att> value of the ancestor <gi>div2</gi>
- </item>
- <item>a space</item>
- <item>the <att>n</att> value of the parent <gi>div1</gi>
- </item>
- </list>
- </p>
- </cRefPattern>
- </refsDecl>
- </encodingDesc>
- <revisionDesc>
- <list>
- <item>
- <date when="1989-04-12">12 Apr 89</date> Last checked by CAC</item>
- <item>
- <date when="1989-03-01">1 Mar 89</date> LB made new file</item>
- </list>
- </revisionDesc>
-</teiHeader>
<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure15.1. Varieties of Composite Text]
This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose.
Example
<text>
- <front>
- <docTitle>
- <titlePart>Autumn Haze</titlePart>
- </docTitle>
- </front>
- <body>
- <l>Is it a dragonfly or a maple leaf</l>
- <l>That settles softly down upon the water?</l>
- </body>
-</text>
Example
The body of a text may be replaced by a group of nested texts, as in the following schematic:
<text>
- <front>
-<!-- front matter for the whole group -->
- </front>
- <group>
- <text>
-<!-- first text -->
- </text>
- <text>
-<!-- second text -->
- </text>
- </group>
-</text>
This element should not be used to document the languages or writing systems used for the bibliographic or manuscript description itself: as for all other TEI elements, such information should be provided by means of the global xml:lang attribute attached to the element containing the description.
In all cases, languages should be identified by means of a standardized ‘language tag’ generated according to BCP 47. Additional documentation for the language may be provided by a <language> element in the TEI header.
Example
<textLang mainLang="en" otherLangs="la"> Predominantly in English with Latin
- glosses</textLang>
The level of a title is sometimes implied by its context: for example, a title appearing directly within an <analytic> element is ipso facto of level ‘a’, and one appearing within a <series> element of level ‘s’. For this reason, the level attribute is not required in contexts where its value can be unambiguously inferred. Where it is supplied in such contexts, its value should not contradict the value implied by its parent element.
The attributes key and ref, inherited from the class att.canonical may be used to indicate the canonical form for the title; the former, by supplying (for example) the identifier of a record in some external library system; the latter by pointing to an XML element somewhere containing the canonical form of the title.
Example
<title>Information Technology and the Research Process: Proceedings of
- a conference held at Cranfield Institute of Technology, UK,
- 18–21 July 1989</title>
Example
<title>Hardy's Tess of the D'Urbervilles: a machine readable
- edition</title>
Example
<title type="full">
- <title type="main">Synthèse</title>
- <title type="sub">an international journal for
- epistemology, methodology and history of
- science</title>
-</title>
<titlePage>
- <docTitle>
- <titlePart type="main">THOMAS OF Reading.</titlePart>
- <titlePart type="alt">OR, The sixe worthy yeomen of the West.</titlePart>
- </docTitle>
- <docEdition>Now the fourth time corrected and enlarged</docEdition>
- <byline>By T.D.</byline>
- <figure>
- <head>TP</head>
- <p>Thou shalt labor till thou returne to duste</p>
- <figDesc>Printers Ornament used by TP</figDesc>
- </figure>
- <docImprint>Printed at <name type="place">London</name> for <name>T.P.</name>
- <date>1612.</date>
- </docImprint>
-</titlePage>
<docTitle>
- <titlePart type="main">THE FORTUNES
- AND MISFORTUNES Of the FAMOUS
- Moll Flanders, &c.
- </titlePart>
- <titlePart type="desc">Who was BORN in NEWGATE,
- And during a Life of continu'd Variety for
- Threescore Years, besides her Childhood, was
- Twelve Year a <hi>Whore</hi>, five times a <hi>Wife</hi> (wherof
- once to her own Brother) Twelve Year a <hi>Thief,</hi>
- Eight Year a Transported <hi>Felon</hi> in <hi>Virginia</hi>,
- at last grew <hi>Rich</hi>, liv'd <hi>Honest</hi>, and died a
- <hi>Penitent</hi>.</titlePart>
-</docTitle>
<titleStmt>
- <title>Capgrave's Life of St. John Norbert: a machine-readable transcription</title>
- <respStmt>
- <resp>compiled by</resp>
- <name>P.J. Lucas</name>
- </respStmt>
-</titleStmt>
<typeNote> (typographic note) describes a particular font or other significant typographic feature distinguished within the description of a printed resource. [10.7.2. Writing, Decoration, and Other Notations]
The same element is used for all cases of uncertainty in the transcription of element content, whether for written or spoken material. For other aspects of certainty, uncertainty, and reliability of tagging and transcription, see chapter 21. Certainty, Precision, and Responsibility.
<unit> contains a symbol, a word or a phrase referring to a unit of measurement in any kind of formal or informal system. [3.6.3. Numbers and Measures]
Here is an example of a <unit> element holding a unitRef attribute that points to a definition of the unit in the TEI header.
<measure>
- <num>3</num>
- <unit unitRef="#ell">ells</unit>
-</measure>
-<!-- In the TEI Header: -->
-<encodingDesc>
- <unitDecl>
- <unitDef xml:id="ell">
- <label>ell</label>
- <placeName ref="#iceland"/>
- <desc>A unit of measure for cloth, roughly equivalent to 18 inches, or from an adult male’s elbow to the tip of the middle finger.</desc>
- </unitDef>
- </unitDecl>
-</encodingDesc>
<unitDecl> (unit declarations) provides information about units of measurement that are not members of the International System of Units. [2.3.9. The Unit Declaration]
<unitDecl>
- <unitDef xml:id="pechys" type="length">
- <label>πῆχυς</label>
- <placeName ref="#athens"/>
- <conversion fromUnit="#daktylos"
- toUnit="#pechys" formula="$fromUnit div 24"/>
- <desc>Equivalent to a cubit or 24 daktyloi.</desc>
- </unitDef>
- <unitDef xml:id="daktylos" type="length">
- <label>δάκτυλος</label>
- <placeName ref="#athens"/>
- <desc>A basic unit of length equivalent to one finger (or the size of a thumb) in ancient Greece.</desc>
- </unitDef>
-</unitDecl>
-<sch:rule context="tei:variantEncoding">
-<sch:report test="@location eq 'external' and @method eq 'parallel-segmentation'"> The @location value "external" is inconsistent with the
- parallel-segmentation method of apparatus markup.</sch:report>
-</sch:rule>
Legal values are:
internal
apparatus appears within the running text.
external
apparatus appears outside the base text.
Note
The value ‘external’ is inconsistent with the parallel-segmentation method of apparatus markup.
model.emphLikegroups phrase-level elements which are typographically distinct and to which a specific function can be attributed. [3.3. Highlighting and Quotation]
model.frontPart.dramagroups elements which appear at the level of divisions within front or back matter of performance texts only. [7.1. Front and Back Matter ]
Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text.
model.hiLikegroups phrase-level elements which are typographically distinct but to which no specific function can be attributed. [3.3. Highlighting and Quotation]
model.limitedPhrasegroups phrase-level elements excluding those elements primarily intended for transcription of existing sources. [1.3. The TEI Class System]
model.measureLikegroups elements which denote a number, a quantity, a measurement, or similar piece of text that conveys some numerical meaning. [3.6.3. Numbers and Measures]
model.pPart.editorialgroups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring. [3.5. Simple Editorial Changes]
model.pPart.transcriptionalgroups phrase-level elements used for editorial transcription of pre-existing source materials. [3.5. Simple Editorial Changes]
The ‘agency’ child elements, while not required, are required if one of the ‘detail’ child elements is to be used. It is not valid to have a ‘detail’ child element without a preceding ‘agency’ child element.
The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header.
Appendix A.2.59 model.stageLike
model.stageLikegroups elements containing stage directions or similar things defined by the module for performance texts. [7.3. Other Types of Performance Text]
att.anchoring (anchoring) provides attributes for use on annotations, e.g. notes and groups of notes describing the existence and position of an anchor for annotations.
In modern texts, notes are usually anchored by means of explicit footnote or endnote symbols. An explicit indication of the phrase or line annotated may however be used instead (e.g. ‘page 218, lines 3–4’). The anchored attribute indicates whether any explicit location is given, whether by symbol or by prose cross-reference. The value true indicates that such an explicit location is indicated in the copy text; the value false indicates that the copy text does not indicate a specific place of attachment for the note. If the specific symbols used in the copy text at the location the note is anchored are to be recorded, use the n attribute.
targetEnd
(target end) points to the end of the span to which the note is attached, if the note is not embedded in the text at that point.
This attribute is retained for backwards compatibility; it may be removed at a subsequent release of the Guidelines. The recommended way of pointing to a span of elements is by means of the range function of XPointer, as further described in 16.2.4.6. range().
Example
<p>(...) tamen reuerendos dominos archiepiscopum et canonicos Leopolienses
- necnon episcopum in duplicibus Quatuortemporibus<anchor xml:id="A55234"/> totaliter expediui...</p>
-<!-- elsewhere in the document -->
-<noteGrp targetEnd="#A55234">
- <note xml:lang="en"> Quatuor Tempora, so called dry fast days.
- </note>
- <note xml:lang="pl"> Quatuor Tempora, tzw. Suche dni postne.
- </note>
-</noteGrp>
In the following example from Hamlet, speeches (<sp>) in the body of the play are linked to <castItem> elements in the <castList> using the who attribute.
For transcribed speech, this will typically identify a participant or participant group; in other contexts, it will point to any identified <person> element.
Appendix A.3.3 att.canonical
att.canonicalprovides attributes that can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. [13.1.1. Linking Names and Their Referents]
<author>
- <name key="Hugo, Victor (1802-1885)"
- ref="http://www.idref.fr/026927608">Victor Hugo</name>
-</author>
Note
The value may be a unique identifier from a database, or any other externally-defined string identifying the referent.
No particular syntax is proposed for the values of the key attribute, since its form will depend entirely on practice within a given project. For the same reason, this attribute is not recommended in data interchange, since there is no way of ensuring that the values used by one project are distinct from those used by another. In such a situation, a preferable approach for magic tokens which follows standard practice on the Web is to use a ref attribute whose value is a tag URI as defined in RFC 4151.
ref
(reference) provides an explicit means of locating a full definition or identity for the entity being named by means of one or more URIs.
The value must point directly to one or more XML elements or other resources by means of one or more URIs, separated by whitespace. If more than one is supplied the implication is that the name identifies several distinct entities.
Appendix A.3.4 att.citeStructurePart
att.citeStructurePartprovides attributes for selecting particular elements within a document.
(use) supplies an XPath selection pattern using the syntax defined in [[undefined XSLT3]]. The XPath pattern is relative to the context given in match, which will either be a sibling attribute in the case of <citeStructure> or on the parent <citeStructure> in the case of <citeData>.
-<sch:rule context="tei:*[@calendar]">
-<sch:assert test="string-length( normalize-space(.) ) gt 0"> @calendar indicates one or more
- systems or calendars to which the date represented by the content of this element belongs,
- but this <sch:name/> element has no textual content.</sch:assert>
-</sch:rule>
He was born on <date calendar="#gregorian">Feb. 22, 1732</date> (<date calendar="#julian"
- when="1732-02-22">Feb. 11, 1731/32,
- O.S.</date>).
-
He was born on <date calendar="#gregorian #julian"
- when="1732-02-22">Feb. 22, 1732
- (Feb. 11, 1731/32, O.S.)</date>.
-
Note
Note that the calendar attribute (unlike datingMethod defined in att.datable.custom) defines the calendar system of the date in the original material defined by the parent element, not the calendar to which the date is normalized.
period
supplies pointers to one or more definitions of named periods of time (typically <category>s or <calendar>s) within which the datable item is understood to have occurred.
This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.datable.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.datable.iso and att.datable.custom classes. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes may not be needed, and there exists much greater software support for the W3C datatypes.
Appendix A.3.6 att.datable.w3c
att.datable.w3cprovides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. [3.6.4. Dates and Times13.4. Dates]
Examples of W3C date, time, and date & time formats.
<p>
- <date when="1945-10-24">24 Oct 45</date>
- <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date>
- <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time>
- <time when="14:12:38">fourteen twelve and 38 seconds</time>
- <date when="1962-10">October of 1962</date>
- <date when="--06-12">June 12th</date>
- <date when="---01">the first of the month</date>
- <date when="--08">August</date>
- <date when="2006">MMVI</date>
- <date when="0056">AD 56</date>
- <date when="-0056">56 BC</date>
-</p>
This list begins in
- the year 1632, more precisely on Trinity Sunday, i.e. the Sunday after
- Pentecost, in that year the
-<date calendar="#julian"
- when="1632-06-06">27th of May (old style)</date>.
-<sch:rule context="tei:*[@when]">
-<sch:report test="@notBefore|@notAfter|@from|@to"
- role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report>
-</sch:rule>
Schematron
-<sch:rule context="tei:*[@from]">
-<sch:report test="@notBefore"
- role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report>
-</sch:rule>
Schematron
-<sch:rule context="tei:*[@to]">
-<sch:report test="@notAfter"
- role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report>
-</sch:rule>
Example
<date from="1863-05-28" to="1863-06-01">28 May through 1 June 1863</date>
Note
The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar.
The most commonly-encountered format for the date portion of a temporal attribute is yyyy-mm-dd, but yyyy, --mm, ---dd, yyyy-mm, or --mm-dd may also be used. For the time part, the form hh:mm:ss is used.
Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used.
Appendix A.3.7 att.datcat
att.datcatprovides attributes that are used to align XML elements or attributes with the appropriate Data Categories (DCs) defined by an external taxonomy, in this way establishing the identity of information containers and values, and providing means of interpreting them. [9.5.2. Lexical View18.3. Other Atomic Feature Values]
provides a pointer to a definition of, and/or general information about, (a) an information container (element or attribute) or (b) a value of an information container (element content or attribute value), by referencing an external taxonomy or ontology. If valueDatcat is present in the immediate context, this attribute takes on role (a), while valueDatcat performs role (b).
provides a definition of, and/or general information about a value of an information container (element content or attribute value), by reference to an external taxonomy or ontology. Used especially where a contrast with datcat is needed.
provides a definition of, and/or general information about, information structure of an object referenced or modeled by the containing element, by reference to an external taxonomy or ontology. This attribute has the characteristics of the datcat attribute, except that it addresses not its containing element, but an object that is being referenced or modeled by its containing element.
The example below presents the TEI encoding of the name-value pair<part of speech, common noun>, where the name (key) ‘part of speech’ is abbreviated as ‘POS’, and the value, ‘common noun’ is symbolized by ‘NN’. The entire name-value pair is encoded by means of the element <f>. In TEI XML, that element acts as the container, labeled with the name attribute. Its contents may be complex or simple. In the case at hand, the content is the symbol ‘NN’.The datcat attribute relates the feature name (i.e., the key) to the data category ‘part of speech’, while the attribute valueDatcat relates the feature value to the data category common noun. Both these data categories should be defined in an external and preferably open reference taxonomy or ontology.
‘NN’ is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the University Centre for Computer Corpus Research on Language at the University of Lancaster. The very same data category used for tagging an early version of the British National Corpus, and coming from the BNC Basic (C5) tagset, uses the symbol ‘NN0’ (rather than ‘NN’). Making these values semantically interoperable would be extremely difficult without a human expert if they were not anchored in a single point of an established reference taxonomy of morphosyntactic data categories. In the case at hand, the string ‘http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545’ is both a persistent identifier of the data category in question, as well as a pointer to a shared definition of common noun.While the symbols ‘NN’, ‘NN0’, and many others (often coming from languages other than English) are implicitly members of the container category ‘part of speech’, it is sometimes useful not to rely on such an implicit relationship but rather use an explicit identifier for that data category, to distinguish it from other morphosyntactic data categories, such as gender, tense, etc. For that purpose, the above example uses the datcat attribute to reference a definition of part of speech. The reference taxonomy in this example is the CLARIN Concept Registry.If the feature structure markup exemplified above is to be repeated many times in a single document, it is much more efficient to gather the persistent identifiers in a single place and to only reference them, implicitly or directly, from feature structure markup. The following example is much more concise than the one above and relies on the concepts of feature structure declaration and feature value library, discussed in chapter [[undefined FS]].
The assumption here is that the relevant feature values are collected in a place that the annotation document in question has access to — preferably, a single document per linguistic resource, for example an <fsdDecl> that is XIncluded as a sibling of <text> or a child of <encodingDesc>; a <taxonomy> available resource-wide (e.g., in a shared header) is also an option.The example below presents an <fvLib> element that collects the relevant feature values (most of them omitted). At the same time, this example shows one way of encoding a tagset, i.e., an established inventory of values of (in the case at hand) morphosyntactic categories.
Note that these Guidelines do not prescribe a specific choice between datcat and valueDatcat in such cases. The former is the generic way of referencing a data category, whereas the latter is more specific, in that it references a data category that represents a value. The choice between them comes into play where a single element — or a tight element complex, such as the <f>/<symbol> complex illustrated above — make it necessary or useful to distinguish between the container data category and its value.
Example
In the context of dictionaries designed with semantic interoperability in mind, the following example ensures that the <pos> element is interpreted as the same information container as in the case of the example of <f name="POS"> above.
Efficiency of this type of interoperable markup demands that the references to the particular data categories should best be provided in a single place within the dictionary (or a single place within the project), rather than being repeated inside every entry. For the container elements, this can be achieved at the level of <tagUsage>, although here, the valueDatcat attribute should be used, because it is not the <tagUsage> element that is associated with the relevant data category, but rather the element <pos> (or <case>, etc.) that is described by <tagUsage>:
<tagsDecl partial="true">
-<!-- ... -->
- <namespace name="http://www.tei-c.org/ns/1.0">
- <tagUsage gi="pos"
- targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">Contains the part of speech.</tagUsage>
- <tagUsage gi="case"
- targetDatcat="http://hdl.handle.net/11459/CCR_C-1840_9f4e319c-f233-6c90-9117-7270e215f039">Contains information about the grammatical case that the described form is inflected for.</tagUsage>
-<!-- ... -->
- </namespace>
-</tagsDecl>
Another possibility is to shorten the URIs by means of the <prefixDef> mechanism, as illustrated below:
This mechanism creates implications that are not always wanted, among others, in the case at hand, suggesting that the identifiers ‘pos’ and ‘adj’ belong to a namespace associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand mechanism whose scope is the current resource. Documenting this clearly in the header of the dictionary is therefore advised.Yet another possibility is to associate the information about the relationship between a TEI markup element and the data category that it is intended to model already at the level of modeling the dictionary resource, that is, at the level of the ODD, in <equiv> element that is a child of <elementSpec> or <attDef>.
Example
The targetDatcat attribute is designed to be used in, e.g., feature structure declarations, and is analogous to the targetLang attribute of the att.pointing class, in that it describes the object that is being referenced, rather than the referencing object.
Above, the <fDecl> uses targetDatcat, because if it were to use datcat, it would be asserting that it is an instance of the container data category part of speech, whereas it is not — it models a container (<f>) that encodes a part of speech. Note also that it is the <f> that is modeled above, not its values, which are used as direct references to data categories; hence the use of datcat in the <symbol> element.
Note
The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) of various types and of various levels of complexity, nested or grouped in various ways. At the most abstract level, an AVM consists of an information container and the value (contents) of that container.
A simple example of an XML serialization of such structures is, on the one hand, the opening and closing tags that delimit and name the container, and, on the other, the content enclosed by the two tags that constitues the value. An analogous example is an attribute name and the value of that attribute.
In a TEI XML example of two equivalent serializations expressing the name-value pair <part-of-speech,common-noun>, namely <pos>commonNoun</pos> and pos="common-noun", one would classify the element <pos> and the attribute pos as containers (mapping onto the first member of the relevant name-value pair), while the character data content of <pos> or the value of pos would be seen as mapping onto the second member of the pair.
The att.datcat class provides means of addressing the containers and their values, while at the same time providing a way to interpret them in the context of external taxonomies or ontologies. Aligning e.g. both the <pos> element and the pos attribute with the same value of an external reference point (i.e., an entry in an agreed taxonomy) affirms the identity of the concept serialised by both the element container and the attribute container, and optionally provides a definition of that concept (in the case at hand, the concept part of speech).
The value of the att.datcat attributes should be a PID (persistent identifier) that points to a specific — and, ideally, shared — taxonomy or ontology. Among the resources that can, to a lesser or greater extent, be used as inventories of (more or less) standardized linguistic categories are the GOLD ontology, CLARIN CCR, OLiA, or TermWeb's DatCatInfo, and also the Universal Dependencies inventory, on the assumption that its URIs are going to persist. It is imaginable that a project may choose to address a local taxonomy store instead, but this risks losing the advantage of interchangeability with other projects.
Historically, datcat and valueDatcat originate from the (the now obsolete) ISO 12620:2009 standard, describing the data model and procedures for a Data Category Registry (DCR). The current version of that standard, ISO 12620-1, does not standardize the serialization of pointers, merely mentioning the TEI att.datcat as an example.
Note that no constraint prevents the occurrence of a combination of att.datcat attributes: the <fDecl> element, which is a natural bearer of the targetDatcat attribute, is an instance of a specific modeling element, and, in principle, could be semantically fixed by an appropriate reference taxonomy of modeling devices.
Appendix A.3.8 att.declarable
att.declarableprovides attributes for those elements in the TEI header which may be independently selected by means of the special purpose decls attribute. [15.3. Associating Contextual Information with a Text]
This element is selected if its parent is selected
false
This element can only be selected explicitly, unless it is the only one of its kind, in which case it is selected if its parent is selected.[Default]
Note
The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text. Only one element of a particular type may have a default attribute with a value of true.
Appendix A.3.9 att.declaring
att.declaringprovides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element. [15.3. Associating Contextual Information with a Text]
(declarations) identifies one or more declarable elements within the header, which are understood to apply to the element bearing this attribute and its content.
The members of this attribute class are typically used to represent any kind of editorial intervention in a text, for example a correction or interpretation, or to date or localize manuscripts etc.
Each pointer on the source (if present) corresponding to a witness or witness group should reference a bibliographic citation such as a <witness>, <msDesc>, or <bibl> element, or another external bibliographic citation, documenting the source concerned.
<encodingDesc>
- <unitDecl>
- <unitDef xml:id="stadium" type="linear">
- <label>stadium</label>
- <placeName ref="#rome"/>
- <conversion fromUnit="#pes"
- toUnit="#stadium" formula="$fromUnit * 625"/>
- <desc>The stadium was a Roman unit of linear measurement equivalent to 625 pedes, or Roman feet.</desc>
- </unitDef>
- </unitDecl>
-</encodingDesc>
Example
<encodingDesc>
- <unitDecl>
- <unitDef xml:id="wmw" type="power">
- <label>whatmeworry</label>
- <conversion fromUnit="#hpk"
- toUnit="#wmw" formula="$fromUnit * 1"/>
- <desc>In the Potrzebie system of measures as introduced by Donald Knuth, the whatmeworry unit of power is equivalent to one hah per kovac.</desc>
- </unitDef>
- <unitDef xml:id="kwmw" type="power">
- <label>kilowhatmeworry</label>
- <conversion fromUnit="#wmw"
- toUnit="#kwmw" formula="$fromUnit div 1000"/>
- <desc>The kilowhatmeworry is equivalent to 1000 whatmeworries.</desc>
- </unitDef>
- <unitDef xml:id="ap" type="power">
- <label>kilowhatmeworry</label>
- <conversion fromUnit="#kwmw"
- toUnit="#ap" formula="$fromUnit div 100"/>
- <desc>One unit of aeolipower (A.P.) is equivalent to 100 kilowhatmeworries.</desc>
- </unitDef>
- </unitDecl>
-</encodingDesc>
<conversion fromUnit="#deciday"
- toUnit="hour"
- formula="$fromUnit cast as xs:decimal * 144 div 60"/>
Note
This attribute class provides formula for use in defining a value used in mathematical calculation. It can be used to store a mathematical operation needed to convert from one system of measurement to another. We use the teidata.xpath datatype to express this value in order to communicate mathematical operations on an XML node or nodes. The $fromUnit variable notation simplifies referencing of the fromUnit attribute on the parent <conversion> element. Note that ‘div’ is required to express the division operator in XPath.
Appendix A.3.14 att.fragmentable
att.fragmentableprovides attributes for representing fragmentation of a structural element, typically as a consequence of some overlapping hierarchy.
specifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure: for example a speech which is divided between two or more verse stanzas, a paragraph which is split across a page division, a verse line which is divided between two speakers.
The value of this attribute is always understood to be a single token, even if it contains space or other punctuation characters, and need not be composed of numbers only. It is typically used to specify the numbering of chapters, sections, list items, etc.; it may also be used in the specification of a standard reference system for the text.
xml:lang
(language) indicates the language of the element content using a ‘tag’ generated according to BCP 47.
<p> … The consequences of
- this rapid depopulation were the loss of the last
-<foreign xml:lang="rap">ariki</foreign> or chief
- (Routledge 1920:205,210) and their connections to
- ancestral territorial organization.</p>
Note
The xml:lang value will be inherited from the immediately enclosing element, or from its parent, and so on up the document hierarchy. It is generally good practice to specify xml:lang at the highest appropriate level, noticing that a different default may be needed for the <teiHeader> from that needed for the associated resource element or elements, and that a single TEI document may contain texts in many languages.
Only attributes with free text values (rare in these guidelines) will be in the scope of xml:lang.
The value used must conform with BCP 47. If the value is a private use code (i.e., starts with x- or contains -x-), a <language> element with a matching value for its ident attribute should be supplied in the TEI header to document this value. Such documentation may also optionally be supplied for non-private-use codes, though these must remain consistent with their (IETF)Internet Engineering Task Force definitions.
xml:base
provides a base URI reference with which applications can resolve relative URI references into absolute URI references.
When multiple values are given, they may reflect either multiple divergent interpretations of an ambiguous text, or multiple mutually consistent interpretations of the same passage in different contexts.
Appendix A.3.17 att.global.change
att.global.changeprovides attributes allowing its member elements to specify one or more states or revision campaigns with which they are associated.
points to one or more <change> elements documenting a state or revision campaign to which the element bearing this attribute and its children have been assigned by the encoder.
att.global.facsprovides attributes used to express correspondence between an element and all or part of a facsimile image or surface. [11.1. Digital Facsimiles]
<group>
- <text xml:id="t1-g1-t1"
- xml:lang="mi">
- <body xml:id="t1-g1-t1-body1">
- <div type="chapter">
- <head>He Whakamaramatanga mo te Ture Hoko, Riihi hoki, i nga Whenua Maori, 1876.</head>
- <p>…</p>
- </div>
- </body>
- </text>
- <text xml:id="t1-g1-t2"
- xml:lang="en">
- <body xml:id="t1-g1-t2-body1"
- corresp="#t1-g1-t1-body1">
- <div type="chapter">
- <head>An Act to regulate the Sale, Letting, and Disposal of Native Lands, 1876.</head>
- <p>…</p>
- </div>
- </body>
- </text>
-</group>
In this example a <group> contains two <text>s, each containing the same document in a different language. The correspondence is indicated using corresp. The language is indicated using xml:lang, whose value is inherited; both the tag with the corresp and the tag pointed to by the corresp inherit the value from their immediate parent.
-<!-- In a placeography called "places.xml" --><place xml:id="LOND1"
- corresp="people.xml#LOND2 people.xml#GENI1">
- <placeName>London</placeName>
- <desc>The city of London...</desc>
-</place>
-<!-- In a literary personography called "people.xml" -->
-<person xml:id="LOND2"
- corresp="places.xml#LOND1 #GENI1">
- <persName type="lit">London</persName>
- <note>
- <p>Allegorical character representing the city of <placeName ref="places.xml#LOND1">London</placeName>.</p>
- </note>
-</person>
-<person xml:id="GENI1"
- corresp="places.xml#LOND1 #LOND2">
- <persName type="lit">London’s Genius</persName>
- <note>
- <p>Personification of London’s genius. Appears as an
- allegorical character in mayoral shows.
- </p>
- </note>
-</person>
In this example, a <place> element containing information about the city of London is linked with two <person> elements in a literary personography. This correspondence represents a slightly looser relationship than the one in the preceding example; there is no sense in which an allegorical character could be substituted for the physical city, or vice versa, but there is obviously a correspondence between them.
synch
(synchronous) points to elements that are synchronous with the current element.
selects one or more alternants; if one alternant is selected, the ambiguity or uncertainty is marked as resolved. If more than one alternant is selected, the degree of ambiguity or uncertainty is marked as reduced by the number of alternants not selected.
(rendition) indicates how the element in question was rendered or presented in the source text.
Status
Optional
Datatype
1–∞ occurrences ofteidata.wordseparated by whitespace
<head rend="align(center) case(allcaps)">
- <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle,
-<lb/>On Her <lb/>
- <hi rend="case(mixed)">New Blazing-World</hi>.
-</head>
Note
These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.
style
contains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text
<head style="text-align: center; font-variant: small-caps">
- <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her
-<lb/>
- <hi style="font-variant: normal">New Blazing-World</hi>.
-</head>
Note
Unlike the attribute values of rend, which uses whitespace as a separator, the style attribute may contain whitespace. This attribute is intended for recording inline stylistic information concerning the source, not any particular output.
The formal language in which values for this attribute are expressed may be specified using the <styleDefDecl> element in the TEI header.
If style and rendition are both present on an element, then style overrides or complements rendition. style should not be used in conjunction with rend, because the latter does not employ a formal style definition language.
rendition
points to a description of the rendering or presentation used for this element in the source text.
The rendition attribute is used in a very similar way to the class attribute defined for XHTML but with the important distinction that its function is to describe the appearance of the source text, not necessarily to determine how that text should be presented on screen or paper.
If rendition is used to refer to a style definition in a formal language like CSS, it is recommended that it not be used in conjunction with rend. Where both rendition and rend are supplied, the latter is understood to override or complement the former.
Each URI provided should indicate a <rendition> element defining the intended rendition in terms of some appropriate style language, as indicated by the scheme attribute.
To reduce the ambiguity of a resp pointing directly to a person or organization, we recommend that resp be used to point not to an agent (<person> or <org>) but to a <respStmt>, <author>, <editor> or similar element which clarifies the exact role played by the agent. Pointing to multiple <respStmt>s allows the encoder to specify clearly each of the roles played in part of a TEI file (creating, transcribing, encoding, editing, proofing etc.).
Example
Blessed are the
-<choice>
- <sic>cheesemakers</sic>
- <corr resp="#editor" cert="high">peacemakers</corr>
-</choice>: for they shall be called the children of God.
-<sch:rule context="tei:*[@source]">
-<sch:let name="srcs"
- value="tokenize( normalize-space(@source),' ')"/>
-<sch:report test="( self::tei:classRef | self::tei:dataRef | self::tei:elementRef |
- self::tei:macroRef | self::tei:moduleRef | self::tei:schemaSpec )
- and $srcs[2]"> When used on a schema description element (like
-<sch:value-of select="name(.)"/>), the @source attribute
- should have only 1 value. (This one has <sch:value-of select="count($srcs)"/>.)
-</sch:report>
-</sch:rule>
Note
The source attribute points to an external source. When used on an element describing a schema component (<classRef>, <dataRef>, <elementRef>, <macroRef>, <moduleRef>, or <schemaSpec>), it identifies the source from which declarations for the components should be obtained.
On other elements it provides a pointer to the bibliographical source from which a quotation or citation is drawn.
In either case, the location may be provided using any form of URI, for example an absolute URI, a relative URI, a private scheme URI of the form tei:x.y.z, where x.y.z indicates the version number, e.g. tei:4.3.2 for TEI P5 release 4.3.2 or (as a special case) tei:current for whatever is the latest release, or a private scheme URI that is expanded to an absolute URI as documented in a <prefixDef>.
When used on elements describing schema components, source should have only one value; when used on other elements multiple values are permitted.
Example
<p>
-<!-- ... --> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested
- term.</quote>
-<!-- ... -->
-</p>
Example
<p>
-<!-- ... -->
- <quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the
- less we seem to know.</quote>
-<!-- ... -->
-</p>
-<!-- ... -->
-<bibl xml:id="chicago_15_ed">
- <title level="m">The Chicago Manual of Style</title>,
-<edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of
- Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>.
-
-</bibl>
Example
<elementRef key="p" source="tei:2.0.1"/>
Include in the schema an element named <p> available from the TEI P5 2.0.1 release.
Example
<schemaSpec ident="myODD"
- source="mycompiledODD.xml">
-<!-- further declarations specifying the components required -->
-</schemaSpec>
Create a schema using components taken from the file mycompiledODD.xml.
Appendix A.3.23 att.handFeatures
att.handFeaturesprovides attributes describing aspects of the hand in which a manuscript is written. [11.3.2.1. Document Hands]
characterizes the particular script or writing style used by this hand, for example secretary, copperplate, Chancery, Italian, etc.
Status
Optional
Datatype
1–∞ occurrences ofteidata.nameseparated by whitespace
scriptRef
points to a full description of the script or writing style used by this hand, typically supplied by a <scriptNote> element elsewhere in the description.
This attribute class provides an attribute for describing a computer resource, typically available over the internet, using a value taken from a standard taxonomy. At present only a single taxonomy is supported, the Multipurpose Internet Mail Extensions (MIME) Media Type system. This typology of media types is defined by the Internet Engineering Task Force in RFC 2046. The list of types is maintained by the Internet Assigned Numbers Authority (IANA). The mimeType attribute must have a value taken from this list.
Appendix A.3.25 att.lexicographic.normalized
att.lexicographic.normalizedprovides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module.
Example from a language documentation project of the Mixtepec-Mixtec language (ISO 639-3: 'mix'). This is a use case where speakers spell something incorrectly but we would like to preserve it for any number of reasons, the use of orig is essential and could have uses for both the speaker to see past mistakes, researchers to get insight into how untrained speakers write their language instinctually (in contrast to prescribed convention), etc.:
<w orig="ntsa sia'i">ntsasia'i</w>
Example from the EarlyPrint project. Fragment of text where obvious errors have been corrected but the original forms remain recorded:
An example from the EarlyPrint project showing the use of both norm and orig. The orig attribute preserves the original version (sometimes with spelling errors, often with printer abbreviations), the element content resolves printer abbreviations but retains the original orthography, and the norm attribute holds normalized values:
It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on <choice>, <orig>, and <reg> needs to be employed.
Appendix A.3.26 att.linguistic
att.linguisticprovides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. [17.4.2. Lightweight Linguistic Annotation]
provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.
(part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).
(morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).
there is no whitespace on the left side of the token
right
there is no whitespace on the right side of the token
both
there is no whitespace on either side of the token
overlap
the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream
The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.
Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.
The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.
The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.
Note
These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion.
If the measurement being represented is not expressed in a particular unit, but rather is a number of discrete items, the unit count should be used, or the unit attribute may be left unspecified.
(commodity) indicates the substance that is being measured
Status
Optional
Datatype
1–∞ occurrences ofteidata.wordseparated by whitespace
Note
In general, when the commodity is made of discrete entities, the plural form should be used, even when the measurement is of only one of them.
Schematron
-<sch:rule context="tei:*[@unitRef]">
-<sch:report test="@unit" role="info">The @unit attribute may be unnecessary when @unitRef is present.</sch:report>
-</sch:rule>
Note
This attribute class provides a triplet of attributes that may be used either to regularize the values of the measurement being encoded, or to normalize them with respect to a standard measurement system.
<l>So weren't you gonna buy <measure quantity="0.5" unit="gal"
- commodity="ice cream">half
- a gallon</measure>, baby</l>
-<l>So won't you go and buy <measure quantity="1.893" unit="L"
- commodity="ice cream">half
- a gallon</measure>, baby?</l>
may be used to specify further information about the entity referenced by this name in the form of a set of whitespace-separated values, for example the occupation of a person, or the status of a place.
(reference to the canonical name) provides a means of locating the canonical form (nym) of the names associated with the object named by the element bearing it.
The value must point directly to one or more XML elements by means of one or more URIs, separated by whitespace. If more than one is supplied, the implication is that the name is associated with several distinct canonical names.
Appendix A.3.30 att.notated
att.notatedprovides attributes to indicate any specialised notation used for element content.
att.personal (attributes for components of names usually, but not necessarily, personal names) common attributes for those elements which form part of a name usually, but not necessarily, a personal name. [13.2.1. Personal Names]
to the right, e.g. to the right of a vertical line of text, or to the right of a figure
below
below the line
left
to the left, e.g. to the left of a vertical line of text, or to the left of a figure
end
at the end of e.g. chapter or volume.
inline
within the body of the text.
inspace
in a predefined space, for example left by an earlier scribe.
<add place="margin">[An addition written in the margin]</add>
-<add place="bottom opposite">[An addition written at the
- foot of the current page and also on the facing page]</add>
-<sch:rule context="tei:*[not(self::tei:schemaSpec)][@targetLang]">
-<sch:assert test="@target">@targetLang should only be used on <sch:name/> if @target is specified.</sch:assert>
-</sch:rule>
In the example above, the <linkGrp> combines pointers at parallel fragments of the Universal Declaration of Human Rights: one of them is in Polish, the other in Swahili.
Note
The value must conform to BCP 47. If the value is a private use code (i.e., starts with x- or contains -x-), a <language> element with a matching value for its ident attribute should be supplied in the TEI header to document this value. Such documentation may also optionally be supplied for non-private-use codes, though these must remain consistent with their (IETF)Internet Engineering Task Force definitions.
target
specifies the destination of the reference by supplying one or more URI References
One or more syntactically valid URI references, separated by whitespace. Because whitespace is used to separate URIs, no whitespace is permitted inside a single URI. If a whitespace character is required in a URI, it should be escaped with the normal mechanism, e.g. TEI%20Consortium.
evaluate
(evaluate) specifies the intended meaning when the target of a pointer is itself a pointer.
if the element pointed to is itself a pointer, then the target of that pointer will be taken, and so on, until an element is found which is not a pointer.
one
if the element pointed to is itself a pointer, then its target (whether a pointer or not) is taken as the target of this pointer.
none
no further evaluation of targets is carried out beyond that needed to find the element specified in the pointer's target.
Note
If no value is given, the application program is responsible for deciding (possibly on the basis of user input) how far to trace a chain of pointers.
Appendix A.3.34 att.ranging
att.rangingprovides attributes for describing numerical ranges.
specifies the degree of statistical confidence (between zero and one) that a value falls within the range specified by min and max, or the proportion of observed values that fall within that range.
The MS. was lost in transmission by mail from <del rend="overstrike">
- <gap reason="illegible"
- extent="one or two letters" atLeast="1" atMost="2" unit="chars"/>
-</del> Philadelphia to the Graphic office, New York.
-
Example
Americares has been supporting the health sector in Eastern
- Europe since 1986, and since 1992 has provided <measure atLeast="120000000" unit="USD"
- commodity="currency">more than
- $120m</measure> in aid to Ukrainians.
-
Appendix A.3.35 att.resourced
att.resourcedprovides attributes by which a resource (such as an externally held media file) may be located.
att.sortableprovides attributes for elements in lists or groups that are sortable, but whose sorting key cannot be derived mechanically from the element content. [9.1. Dictionary Body and Overall Structure]
David's other principal backer, Josiah
- ha-Kohen <index indexName="NAMES">
- <term sortKey="Azarya_Josiah_Kohen">Josiah ha-Kohen b. Azarya</term>
-</index> b. Azarya, son of one of the last gaons of Sura was David's own first
- cousin.
Note
The sort key is used to determine the sequence and grouping of entries in an index. It provides a sequence of characters which, when sorted with the other values, will produced the desired order; specifics of sort key construction are application-dependent
Dictionary order often differs from the collation sequence of machine-readable character sets; in English-language dictionaries, an entry for 4-H will often appear alphabetized under ‘fourh’, and McCoy may be alphabetized under ‘maccoy’, while A1, A4, and A5 may all appear in numeric order ‘alphabetized’ between ‘a-’ and ‘AA’. The sort key is required if the orthography of the dictionary entry does not suffice to determine its location.
The @spanTo attribute must point to an element following the current element
-<sch:rule context="tei:*[@spanTo]">
-<sch:assert test="id(substring(@spanTo,2)) and following::*[@xml:id=substring(current()/@spanTo,2)]">The element indicated by @spanTo (<sch:value-of select="@spanTo"/>) must follow the current element <sch:name/>
-</sch:assert>
-</sch:rule>
Note
The span is defined as running in document order from the start of the content of the pointing element to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element. If no content is present, the assumption is that the starting point of the span is immediately following the element itself.
Appendix A.3.38 att.timed
att.timedprovides attributes common to those elements which have a duration in time, expressed either absolutely or by reference to an alignment map. [8.3.5. Temporal Information]
If no value is supplied, the element is assumed to precede the immediately following element at the same hierarchic level.
Appendix A.3.39 att.transcriptional
att.transcriptionalprovides attributes specific to elements encoding authorial or scribal intervention in a text when transcribing manuscript or similar sources. [11.3.1.4. Additions and Deletions]
indicates the effect of the intervention, for example in the case of a deletion, strikeouts which include too much or too little text, or in the case of an addition, an insertion which duplicates some of the text already present.
all of the text indicated as an addition duplicates some text that is in the original, whether the duplication is word-for-word or less exact.
duplicate-partial
part of the text indicated as an addition duplicates some text that is in the original
excessStart
some text at the beginning of the deletion is marked as deleted even though it clearly should not be deleted.
excessEnd
some text at the end of the deletion is marked as deleted even though it clearly should not be deleted.
shortStart
some text at the beginning of the deletion is not marked as deleted even though it clearly should be.
shortEnd
some text at the end of the deletion is not marked as deleted even though it clearly should be.
partial
some text in the deletion is not marked as deleted even though it clearly should be.
unremarkable
the deletion is not faulty.[Default]
Note
Status information on each deletion is needed rather rarely except in critical editions from authorial manuscripts; status information on additions is even less common.
Marking a deletion or addition as faulty is inescapably an interpretive act; the usual test applied in practice is the linguistic acceptability of the text with and without the letters or words in question.
cause
documents the presumed cause for the intervention.
<div type="verse">
- <head>Night in Tarras</head>
- <lg type="stanza">
- <l>At evening tramping on the hot white road</l>
- <l>…</l>
- </lg>
- <lg type="stanza">
- <l>A wind sprang up from nowhere as the sky</l>
- <l>…</l>
- </lg>
-</div>
Note
The type attribute is present on a number of elements, not all of which are members of att.typed, usually because these elements restrict the possible values for the attribute in a specific way.
subtype
(subtype) provides a sub-categorization of the element, if needed
The subtype attribute may be used to provide any sub-classification for the element additional to that provided by its type attribute.
Schematron
-<sch:rule context="tei:*[@subtype]">
-<sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert>
-</sch:rule>
Note
When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists .
If the apparatus contains readings only for a single witness, this attribute may be consistently omitted.
This attribute may occur both within an apparatus gathering variant readings in the transcription of an individual witness and within an apparatus gathering readings from different witnesses.
Additional descriptions or alternative versions of the sigla referenced may be supplied as the content of a child <wit> element.
Appendix A.3.42 att.written
att.writtenprovides attributes to indicate the hand in which the content of an element was written in the source being transcribed. [1.3.1. Attribute Classes]
macro.phraseSeq.limited (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents. [1.4.1. Standard Content Models]
-tei_macro.phraseSeq.limited =
- ( text | tei_model.limitedPhrase | tei_model.global )*⚓
Appendix A.4.5 macro.specialPara
macro.specialPara ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements. [1.3. The TEI Class System]
Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.
Appendix A.5.2 teidata.count
teidata.countdefines the range of attribute values used for a non-negative integer value used as a count.
<time dur-iso="PT0,75H">three-quarters of an hour</time>
Example
<date dur-iso="P1,5D">a day and a half</date>
Example
<date dur-iso="P14D">a fortnight</date>
Example
<time dur-iso="PT0.02S">20 ms</time>
Note
A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the last, which may have a decimal component (using either . or , as the decimal point; the latter is preferred). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator T must precede the first ‘time’ number-letter pair.
For complete details, see ISO 8601 Data elements and interchange formats — Information interchange — Representation of dates and times.
Appendix A.5.4 teidata.duration.w3c
teidata.duration.w3cdefines the range of attribute values available for representation of a duration in time using W3C datatypes.
A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the S number, which may have a decimal component (using . as the decimal point). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator T must precede the first ‘time’ number-letter pair.
Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.
Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.
Appendix A.5.6 teidata.language
teidata.languagedefines the range of attribute values used to identify a particular combination of human language and writing system. [6.1. Language Identification]
The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice.
A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.
language
The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at http://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case.
script
The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at http://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need.
region
Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm.
variant
An IANA-registered variation. These codes ‘are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags’.
extension
An extension has the format of a single letter followed by a hyphen followed by additional subtags. These exist to allow for future extension to BCP 47, but as of this writing no such extensions are in use.
private use
An extension that uses the initial subtag of the single letter x (i.e., starts with x-) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header.
There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications.
Second, an entire language tag can consist of only a private use subtag. These tags start with x-, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header.
Examples include
sn
Shona
zh-TW
Taiwanese
zh-Hant-HK
Chinese written in traditional script as used in Hong Kong
Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits.
Appendix A.5.8 teidata.numeric
teidata.numericdefines the range of attribute values used for numeric values.
Any numeric value, represented as a decimal number, in floating point format, or as a ratio.
To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3.
A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2.
Appendix A.5.9 teidata.outputMeasurement
teidata.outputMeasurementdefines a range of values for use in specifying the size of an object that is intended for display.
<figure>
- <head>The TEI Logo</head>
- <figDesc>Stylized yellow angle brackets with the letters <mentioned>TEI</mentioned> in
- between and <mentioned>text encoding initiative</mentioned> underneath, all on a white
- background.</figDesc>
- <graphic height="600px" width="600px"
- url="http://www.tei-c.org/logos/TEI-600.jpg"/>
-</figure>
Note
These values map directly onto the values used by XSL-FO and CSS. For definitions of the units see those specifications; at the time of this writing the most complete list is in the CSS3 working draft.
Appendix A.5.10 teidata.pattern
teidata.patterndefines attribute values which are expressed as a regular expression.
A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern H(ä|ae?)ndel (or alternatively, it is said that the pattern H(ä|ae?)ndelmatches each of the three strings)
This TEI datatype is mapped to the XSD token datatype, and may therefore contain any string of characters. However, it is recommended that the value used conform to the particular flavour of regular expression syntax supported by XSD Schema.
Appendix A.5.11 teidata.point
teidata.pointdefines the data type used to express a point in cartesian space.
A point is defined by two numeric values, which should be expressed as decimal numbers. Neither number can end in a decimal point. E.g., both 0.0,84.2 and 0,84 are allowed, but 0.,84. is not.
Appendix A.5.12 teidata.pointer
teidata.pointerdefines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
The range of syntactically valid values is defined by RFC 3986Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, https://secure.wikimedia.org/wikipedia/en/wiki/% is encoded as https://secure.wikimedia.org/wikipedia/en/wiki/%25 while http://موقع.وزارة-الاتصالات.مصر/ is encoded as http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/
Appendix A.5.13 teidata.probCert
teidata.probCertdefines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value.
teidata.temporal.w3cdefines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.
Appendix A.5.17 teidata.text
teidata.textdefines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
The possible values of this datatype are 1 or true, or 0 or false.
This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue.
Appendix A.5.19 teidata.versionNumber
teidata.versionNumberdefines the range of attribute values used for version numbers.
Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.
Appendix A.5.21 teidata.xTruthValue
teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
Any XPath expression using the syntax defined in 6.2..
When writing programs that evaluate XPath expressions, programmers should be mindful of the possibility of malicious code injection attacks. For further information about XPath injection attacks, see the article at OWASP.
\ No newline at end of file
diff --git a/app/data/schema/pessoaTEI.odd b/app/data/schema/pessoaTEI.odd
deleted file mode 100644
index 684ed4a7..00000000
--- a/app/data/schema/pessoaTEI.odd
+++ /dev/null
@@ -1,6033 +0,0 @@
-
-
-
-
-
- TEI Customization for the Digital Edition of Fernando Pessoa
- Ulrike Henny-Krahmer
- Erik Renz
-
-
-
for use by whoever wants it
-
-
- http://www.example.org/ns/nonTEI
-
-
-
created on Monday 12th October 2015 02:26:37 PM
-
-
-
-
-
-
-
- TEI Customization for the Digital Edition of Fernando
- Pessoa
- generated by Roma 4.10
-
- written by Ulrike Henny-Krahmer and Erik Renz
- 2017–2023
-
-
-
-
-
- Introduction
-
This document serves to describe how the TEI standard was customized for the
- project Digital Edition of Fernando Pessoa. Projects and
- Publications. In the project, two main types of sources are
- transcribed: (1) documents that Pessoa authored, containing editorial lists,
- notes, and plans (referenced as documents in the
- following), and (2) poems and prose texts that he published during lifetime (publications). The first type of source is hand- or
- typewritten and may include changes such as later additions, substitutions, or
- deletions. The second type consists of material printed in journals.
-
The encoding of the metadata in the TEI header
- and of information about images in the facsimile
- section are outlined for both types of sources together,
- differentiating between them where necessary. How the transcribed text is
- encoded in the TEI body is explained for
- each source type separately. Examples are given in the running text.
-
-
-
- The TEI header
-
- General information
-
- Title, author and responsibilities
-
In the title statement of the TEI header, information about the title and
- author of a document or publication is gathered. In the case of the documents, the title corresponds to the
- identifier that the document has in the source collection that it was
- retrieved from, for example:
- BNP/E3 5-83r
-
If the title is formatted in a special way this is marked up inside of
- the title element, as the superscript letter in the
- example.
-
The publications carry the title of the work
- published:
- De "O Guardador de
- Rebanhos"
-
For the documents, the author is always Fernando
- Pessoa, as the historical person who wrote the editorial notes, lists,
- and plans:
- Fernando Pessoa
-
The author of a publication, in contrast,
- corresponds to the name that the work was published under and can also
- be one of Pessoa's heteronyms:
- Alberto Caeiro
-
In both cases, the name of the author is further marked up with an
- rs (referencing string) element declaring that the author's
- name is a reference to a person name identified elsewhere. In the
- project, an external list of person names is kept where each name has an
- identifier. The main heteronyms of
- Pessoa have the identifiers "FP" (Fernando Pessoa), "AC" (Alberto
- Caeiro), "AdC" (Álvaro de Campos), "RR" (Ricardo Reis), and "BS"
- (Bernardo Soares), which are given as values of the attribute
- key in the references.
-
Furthermore, in the title statement, different responsibilities for the
- creation of the encoded file are listed, each in an element
- respStmt, containing an element resp where the
- kind of responsibility (or responsibilities) is described and an element
- name indicating the full name of the person
- responsible:
- Edição,
- TranscriçãoPedro
- SepúlvedaTranscriçãoPablo
- Javier Pérez
- LópezModelagem de dados,
- CodificaçãoUlrike
- Henny-KrahmerCodificaçãoAlena
- GeduldigConsultoriaInstitut
- für Dokumentologie und Editorik
- (IDE)
-
-
- Publication
-
The element publicationStmt (publication statement) contains
- information about the published TEI file. In the following, an example
- is given:
-
- Universidade Nova de Lisboa, Instituto de Estudos de
- Literatura e Tradição (IELT)
- Cologne Center for eHumanities (CCeH)
- 2017
-
-
-
- BNP_E3_144D2-111r.xml
-
-
It contains information about the publishing institutions (encoded in the
- element publisher) and about the publication date of the file
- (in an element date). Furthermore, a statement on the
- availability of the file is made and a licence information is given. The
- availability can be either "free", if the file is already published, or
- "restricted", if the work on the TEI file is still ongoing. All the TEI
- documents are published under a Creative Commons Attribution 4.0
- Unported license (CC BY 4.0). Finally, the filename is given as an
- identifier in an idno element.
-
-
- Notes statement
-
The element notesStmt (notes statement) serves as an editorial
- note, encompassing annotations that provide additional information
- beyond the details given in the sections of the source description. It consists of two parts, with at least
- the second part always being present:
-
-
- Poema publicado em A Revista
- da Solução Editora , 1929 e Cancioneiro do 1º Salão dos Independentes , 1930.
- Apresentamos aqui as imagens de ambas as publicações, cujos
- textos são, em termos formais e de conteúdo, idênticos.
-
- Poesia
-
-
-
-
The first part functions as a type of individual free-text comment and is
- provided within a element note with the attribute
- type, which always containing the value
- "summary".
-
The second part, on the other hand, serves as a section for genre
- classification. It is also provided inside a element note with
- the attribute type, but this time always with the value
- "genre". Furthermore, each note element contains
- an rs element with the attribute type, also always
- holding the value genre, and the attribute key.
- Depending on the content type, one of two options, namely the values
- "poesia" and "prosa", is specified within the
- attribute key.
-
The element notesStmt is specific to the publications and does not apply to the documents. Regarding the documents, the
- genre assignment can be found in the second part of the content
- description. For that, see the section on contents below.
-
-
- Description of the source
-
The sources of the documents and publications are documented in the source description. Because
- the documents are archival sources and the publications published bibliographic items, the
- source description for these two types of resources is made in a
- different way, as outlined in the following subsections.
-
- Sources of documents
-
The sources of the documents are encoded inside of an element
- msDesc (manuscript description), which itself is a
- child element of the element sourceDesc:
-
- [...]
-
-
The manuscript description has three parts. The first part servers to
- identify the source, the second part to describe its contents, and
- the third part to encode details on the history of the source.
-
- Identification
-
The identification is done inside of the element
- msIdentifier, which is the first child element of
- msDesc:
- Biblioteca Nacional de
- PortugalBNP/E3
- 144D2-111r
-
Inside of that element, the institution holding the source is
- indicated in an element institution. Furthermore, the
- identifier that the source has in the source institution as well
- as in this project is listed in an element idno.
-
-
- Contents
-
The contents of the source are described in an element
- msContents, which follows after the element
- msIdentifier, as in the following example:
-
-
Lista manuscrita no caderno 144D2 (cf. http://purl.pt/13880), publicada em Sensacionismo e outros Ismos
- (2009, 343).
Fernando
- PessoaLista
- editorialPlano
- editorial
-
The description of the contents has three parts. First, a general
- note on the source document is given in an element
- summary. The summary also indicates if and where a
- document has been published before,
- outside of this edition project. The second part of the content
- description is given in an element msItemStruct
- (structured manuscript item). It contains information about the
- author of the document (which is always
- Fernando Pessoa). Furthermore, it contains a note on the genre
- or genres of the document. This note is
- encoded with the element note, which has the attribute
- type and the value genre. Each genre
- inside of the note is marked with an element rs of the
- type
- genre. The attribute key indicates the
- identifier of the genre. Inside of the rs element, the
- name of the genre is given in text form and in Portuguese
- language. The following three genres of documents occur: "Lista editorial"
- (lista_editorial), "Plano editorial"
- (plano_editorial), and "Nota editorial"
- (nota_editorial). Finally, also the language or
- languages of the text on the document are indicated, in the
- element textLang. The main language of a document is
- given in the attribute mainLang on that element. The
- value of that attribute is a shortcut for a language, in this
- case "pt" for Portuguese.
-
-
-
- History
-
The third part, which contains information about the history of
- the source, is described in an element history. Inside
- history, there is the element origDate
- (origin date), which is enclosed by the elements origin
- and p:
-
-
-
-
- ?
-
-
-
-
-
The element origDate contains various forms to identify
- the origin date of a source document. In general, a distinction
- is made between certain and uncertain data. The indication of
- uncertainty is done through the use of the attribute
- cert, which, when used, always has the value
- medium:
-
-
- c. 1913
-
-
-
As already indicated in the two examples above, the
- representation of the temporal data itself may vary. The
- following variations are possible:
- (1) Only providing a year.
- (2) Providing a year along with a month.
- (3) Providing a year along with month and day.
- (4) Not providing a date, indicated by a question
- mark.
- To indicate the first three variations of possible origin
- dates, there are also different options expressed through
- different attributes. The attribute when, for
- example, specifies a particular date:
-
-
- c.
- 14-2-1933
-
-
-
The attributes from and to indicate a
- specific time period:
-
-
- 1916-1919
-
-
-
The attribute notBefore indicates that the occurrence
- happened only after a certain point in time:
-
-
- post. 1933
-
-
-
The attribute notAfter indicates that the occurrence
- happened only before a certain point in time:
-
-
- ant. Dezembro de
- 1922
-
-
-
Only when specifying a missing date, indicated by a question
- mark, none of the mentioned attributes are used.
-
-
-
- Sources of publications
-
The sources of the publications are encoded only within an element
- sourceDesc:
-
- [...]
-
-
The description has two parts. The first part serves to identify the
- respective work(s) within the index of works, while the second part
- is used to describe the bibliographic information of the
- work(s).
-
- Work index
-
Within the element sourceDesc, there is a element
- list with the attribute type, indicates
- by the value work-index that it functions as a list
- for indexing works.
As seen in the example above, the element list can
- contain multiple item elements, but there will always
- be at least one item element representing a single
- entry in the list. Each item element contains an
- rs element with the attribute type (in
- which the value work can always be found) and the
- attribute key. The attribute key holds a
- unique key-value that consistently follows a specific pattern,
- as demonstrated in the examples above and below: It begins with
- the letter "W" followed by a consecutive number.
The unique key-value is determined not only by the title of a
- work but also by the authorship. In the project, the allocation
- of work identifiers takes places is an external central work
- register.
-
-
- Bibliographic description
-
The bibliographic information of the publications are described
- in an element biblStruct, in which only bibliographic
- sub-elements in a specific order appear, according to the
- general TEI guidelines:
-
-
- Reincidindo...
-
- Fernando Pessoa
-
-
-
-
-
- A Águia
-
-
- maio de 1912
-
- 5
- 137-144
-
-
-
-
The description of the bibliographic information has two parts.
- The first part is given in the element analytic
- (analytic level):
This part contains the bibliographic title of the item, such as
- an article or poem, that is published within a monograph or
- journal rather than as an independent publication. It also
- includes the attribute key to indicate the author,
- which usually refers to Pessoa himself (with the identifier
- FP) or, in some cases, to one of his heteronyms
- (AC, AcD, RR, or
- BS). Furthermore, in the analytic part, the
- element textLang provides a value to indicate the
- primary language of the bibliographic work in the attribute
- mainLang, as well as one or more values to
- identify any other languages used in the published work in the
- otherLangs.
-
The element monogr (monographic level) provides the
- second part of the bibliographic description. It contains the
- bibliographic information about the item (e.g., a monograph or a
- journal) that was published as an independent object (i.e., a
- stand-alone part) and which includes the work described in the
- analytic element:
-
-
-
- A Águia
-
-
- maio de 1912
-
- 5
- 137-144
-
-
-
The element monogr always includes the element
- biblScope, which defines the scope of the
- bibliographic work mentioned in the first part. The scope may
- encompass various details, such as page numbers or a named
- subdivision within a larger work.
-
In some cases, a work has been published more than once, which
- necessitates the use of multiple biblStruct
- elements:
-
-
-
- Mar Português
-
- Fernando Pessoa
-
-
-
-
-
- Contemporânea
-
-
- outubro de 1922
-
- 4
- 9-14
-
-
-
-
- Mar Português
-
- Fernando Pessoa
-
-
-
-
-
- Leitura para todos —
- Revista mensal ilustrada
-
-
- junho de 1926
- Rio de Janeiro
-
- 83
- 22-26
-
-
-
-
In order to formally address these multiple bibliographic
- information, they require an identifier in the XML. Therefore,
- each carries an attribute xml:id that has a unique
- value. These identifiers consistently adhere to a specific
- pattern: they begin with the abbreviated title of the journal or
- monograph, followed by an underscore, and then the year of
- publication. Accordingly, for the above example of the work
- entitled "Mar Português", published once in 1922 in
- the journal "Contemporânea" and again four years
- later in the journal "Leitura para todos - Revista mensal
- ilustrada", the following two IDs result:
- "Contemporânea_1922" and
- "Leitura_1926".
-
The purpose of assigning the identifiers is to enable a reference
- in the text to the specific place of publication as well as to
- the respective facsimiles from the various issues. In this way,
- variations within a text can be clearly assigned to a specific
- source and formally linked to it.
-
-
-
-
- Encoding description
-
The relationship between the transcribed text and its source, the
- facsimiles from which it is derived, is documented using the element
- encodingDesc (encoding description), which only appears in
- the publications and not in the documents themselves.
-
Within the element encodingDesc, there's always the element
- variantEncoding (variant encoding), which contains the
- attributes method and location:
-
-
-
-
-
-
According to the general TEI guidelines, the attribute method
- indicates which method is used to encode the variants' apparatus. It
- always contains the value "parallel-segmentation", expressing
- that alternate readings of a passage are presented side by side in the
- text, without the need for a base text. In contrast, the attribute
- location indicates whether the apparatus appears within
- the text or outside it. It consistently holds the value
- "internal" signifying that the apparatus appears within
- the text.
-
-
-
-
- Facsimiles
-
The facsimile element is used to hold information about the image files
- representing a facsimile of the text transcribed in the TEI body. For the Pessoa
- edition, these image files are stored on the image server of the Cologne Center
- for eHumanities (CCeH). The link to this image server is specified in the
- xml:base attribute within the facsimile element.
-
Within all TEI files, the facsimile element contains one or more
- graphic elements, each indicating the path to individual image
- files in a url. These paths are relative to the base URI for the
- images. For example:
-
-
-
-
-
-
-
As mentioned in the sections of the
- bibliographic description, there are instances where a work has been
- published more than once, which necessitates the use of multiple
- facsimile elements:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
As shown in the example above, there's an additional attribute corresp
- within the facsimile element, alongside the existing attribute
- xml:base. The value of the corresp attribute points to
- the identifier of a bibliographic source defined in the source description in the TEI header, so that a specific source is
- linked to a specific set of facsimiles. This is relevant in cases in which there
- are several different sources of a published work and several corresponding sets
- of facsimiles. To refer to these identifiers, a preceding hash sign (#) is
- used.
-
-
- Transcriptions
-
- Encoding of documents
-
- General structure
-
The transcription of documents is encoded inside
- of the TEI text and body elements. Each text body must
- at least contain one division, encoded with a div element:
-
-
-
Na Casa de saude de
- Cascaes [...]
-
-
-
The example is taken from the document
- BNP
- 5-83r which begins with the heading "Na Casa de saude...".
- This heading introduces the first division of the document and is
- therefore included inside a div. The facsimile of the whole
- document is given in the following:
-
-
Here it becomes visible that the document has three parts, each starting
- with an own heading. Each of these parts is encoded in an own
- division:
-
-
-
Na Casa de saude de
- Cascaes [...]
-
-
- Vida e obras do engenheiro
- Alvaro de
- Campos.
-
-
-
Livro do
- Desassocego. [...]
-
-
-
Divisions can also be nested if there is a subpart inside of a general
- part.
Inside of the main divisions of a document, further structures are
- encoded, such as headings, paragraphs, lists, or tables.
-
- Headings
-
Headings are included in a head element and should occur
- directly after the opening of a division or other main structure:
-
Na Casa de saude de
- Cascaes [...]
-
-
- Paragraphs and other blocks
-
Paragraphs are encoded with a p element. A simple example is
- shown in the following:
-
Omitto neste os
- commentarios de Augusto da
- Costa, poisque são d’elle e não
- meus.
-
Text that is not a full paragraph can be encoded with a ab
- element ("anonymous block"):
-
Na Casa de saude de
- Cascaesinclue:
- – [...]
-
Also enumerations of titles can have the form of paragraphs (instead
- of lists), if the items are written down one after the other without
- structuring them as a list. An example is shown in the following
- facsimile of the list BNP 125A-52r:
-
-
The second part of the document, entitled "Manual do Sebastianista",
- contains three titles that are mentioned directly after each other
- and just separated by an hyphen. This is encoded as follows,
- wrapping a p element around the titles:
- Manual do
- Sebastianista
Historia do sebastianismo —
- Prophecias sebastianistas
- e sua interpretação — A re-nascença do
- sebastianismo.
-
-
- Lists
-
A list is a set of ordered items that may be numbered or not. Lists
- are encoded with the element list, which contains one or
- several child elements item, as in the following
- example:
-
- Introducção, entrevista com Antonio Mora.
- Alberto
- CaeiroRicardo Reis.
- “Prolegomenos” de Antonio Mora.
- “Fragmentos”.
-
If the items are numbered or otherwise marked (for example by initial
- dashes), these marks are encoded with the element label at
- the beginning of each list item. The text of the list item follows
- directly after the label.
-
- Lists inside of lists
-
Sometimes, an item of a list contains itself another list. An
- example is shown in the following facsimile of the list BNP
- 128-11r:
-
-
The last part contains a list with two items: "Parlour Games" and
- "Technical Dictionaries". The "Technical Dictionaries" item
- contains itself a sublist of two items: "A. Commercial" and
- "B.". This is encoded by using a list inside of a list:
After the text of the item "Technical Dictionaries", but still
- inside of that item, another list element opens. It has
- the attribute rend with the value inline
- to mark that this sublist should not begin on a new line but be
- placed after the text of the item containing it. The items of
- the sublist are encoded inside of this second list
- element in the usual way.
-
Another example of a sublist can be seen on the following page of
- the list BNP 133M-96 a 98:
-
-
In the third item "Gamage...", there is a sublist which does not
- start on the same line as the item text, but on the next line.
- Still, the list is indented, so that it becomes clear that it is
- a list inside the bigger list. This is encoded as:
- Gamage, or another, or
- elseways: Table-football.(Table-cricket).Strategy.Opposition.(Aspects)Lomelino’s
- game.
-
-
The only difference to the previous example is that the attribute
- rend has the value indent instead of
- inline.
-
-
-
- Tables
-
On the some of the documents, the structure of the text is more
- similar to a table than to a list or could be interpreted as both a
- list or a table. The following facsimile of the list BNP 144Q-34r
- shows such a case:
-
-
The first part of the document contains a list to which a column of
- numbers is attached to the right. To be able to align this last
- column with the list entries, the whole list is encoded as a
- table:
-
approx.approximadamenteno.numberof
- pages.Introduction
- (brief).4.The Anarchist Banker.54.Complete Poems of Alberto Caeiro. (or
- only The “Keeper of
- Flocks”)
- [...]
-
Tables are encoded with the element table. They contain
- first rows, encoded with the element row, and then for each
- row the column values, encoded with the element cell. Here,
- the table has a row for each list entry and three columns. The first
- column holds the labels of the entries (1., 2., 3., ...). The second
- column contains the text of the entires ("Introduction (brief)",
- "The Anarchist Banker", etc.), and the third column contains the
- number of pages. For the items for which no page numbers are given,
- the third cell is left empty, but it still needs to be
- there, so that the structure of the table is correct. In columns
- with numbers, the text is usually aligned to the right. This is
- indicated with the attribute rend on the respective
- cell and the value right. The first row of
- the column is special, because it only contains a heading for the
- third column: "approx. no. of pages". Here, only one cell
- element is given for the row which has the attribute cols
- with the value 3. This means that in that row, one column
- spans over the width of all the three table columns. In addition,
- the text of this cell is aligned to the right, using the attribute
- rend with the value right on
- cell, so that the text "approx. no. of pages" appears to
- the right. Also the lower part of the document contains a list that
- has a column with page numbers attached to it and is therefore
- interpreted as having a tabular structure.
-
-
- Notes
-
Notes can occur everywhere in a document. Usually, a note is
- contained inside of a list or at the margin of it. Notes may be
- interpreted as part of the original version of a list or as having
- been added later. In the latter case, they are encoded genetically
- (for an example of the genetic encoding of a note see Notes added on the margin). An
- example of simple notes added to the margin is shown in the
- following facsimile:
-
-
Here Pessoa placed question marks to the left of some of the list
- items. This is encoded as follows:
Here, the second item has a note on the left margin. When the note is
- on the left margin, the element note is added at the
- beginning of the list item, inside of the element item. The
- note gets the attribute place, in this case
- with the value margin-left. For place, also
- the values margin-right (then the note element
- would be added at the end of the list item), top,
- below, and center are possible. The text
- of the note is simply added inside of the note element.
-
-
- Line breaks
-
Line breaks of text are encoded using the empty element lb
- as in the following example of a list item which continues on a
- second line:
- Introducção, entrevista
- com Antonio Mora.
-
-
The element lb is only used if the line break is not due to
- the structure, meaning that new divisions, paragraphs or list items
- are not especially marked with lb, only line breaks in
- running text are marked with it.
-
-
- Punctuation characters
-
Sometimes it is necessary to encode punctuation characters, for
- example hyphens used to divide words at the end of lines, so that
- these can be displayed or not when the document is rendered,
- depending on whether line breaks are included in the visualization
- of the document or not. Punctuation characters are encoded with the
- element pc, as in the following example:
-
-
Here there is an hyphen between "Disserta" and "ções" which
- is marked up with the element pc. The line break following
- the division is encoded after the punctuation characters, using the
- empty element lb.
-
-
-
- (Typo)graphical renditions
-
An important part of the encoding of the documents
- in the project is how certain aspects of them were rendered in the
- sources. In general, indications about how something looked like (how it
- was (typo)graphically emphasized or organized) are made in the attribute
- rend, which can be used on many different elements.
-
- Alignment of text
-
By default, the text of different elements is shown on the left side
- of the page. It the text should instead be centered or appear on the
- right side, this can be indicated with the attribute rend
- and the values center or right. An example of
- a heading which is centered is given in the following:
- Q.
-
-
- Highlighted characters, words, or passages
-
- Underlinings
-
Often, Pessoa highlighted text by underlining it. This is encoded
- using the element hi in combination with
- rend. In the following example, the whole heading
- is underlined. It does therefore contain a child element
- hi with the attribute rend, having the
- value underline:
- Na Casa de saude de
- Cascaes [...]
-
Also individual characters or words can be underlined. Then the
- hi element is just wrapped around these parts.
-
-
-
- Superscripts
-
If text (or individual letters) are added as a superscript,
- meaning that they are attached as small letters to the top of a
- preceding word, this is encoded as follows:
- Carta do MzMarquez
- de Pombal.
-
In the example, the list item contains the name "Marquez de
- Pombal", which is abbreviated using just the letter "M" with a
- small superscript "z" for "Marquez". The superscript is encoded
- with the element hi and the attribute rend
- with the value superscript.
-
-
- Frames (square boxes)
-
Sometimes, Pessoa highlights parts of a list by drawing a frame
- around it. An example can be seen on the list 133M-30r:
-
-
Here, a box is drawn around the text "(Advertise for Cipher
- Agency - America)". This is encoded as follows (this is at the
- same time an example of a modification, see Other modifications):
- (Advertise for Cipher Agency
- — America).
-
The element mod surrounds the text to be framed. The
- attribute rend indicates how it should be modified,
- here by adding a frame (framed). The attribute
- n with the value 2 indicates that the
- modification was only done later and should be part of the
- second edited version of the document.
-
If the frame was not added later, but would have been part of the
- original list, instead of mod the element hi
- could be used to say that the passage is highlighted by framing
- it:
- (Advertise for Cipher Agency —
- America).
-
-
- Circled text
-
Sometimes Pessoa circles text to highlight it, meaning that he
- surrounds text with a circle. This is encoded as follows:
- √
-
In the example, a note on the right margin is circled, which is
- just indicated by adding the attribute rend to the
- note and give it the value circled.
-
Also, the element hi can be used, as in the following
- example:
-
or Being
- an apology forof all culture not genuine.
-
-
Here, there is a part inside of a paragraph that is circled.
- Because this part has no element by itself to attach the
- attribute rend to, the element hi is used to
- mark that the text is highlighted. Again the value of
- rend is circled.
-
-
-
- Indentations
-
It may be the case that the text does not start directly at the
- beginning of a line but is indented. In the following example, there
- is a list which is not starting in an own line, but after the text
- "inclue: –". The list does therefore carry the attribute
- rend with the value inline.
- inclue:
- – Introducção,
- entrevista com Antonio Mora. [...]
-
Also, in this example, the text of the list items is indented from
- the second line on (meaning that the first line of each item is not
- indented, but every other line following it, is). This is indicated
- b using the rend with indent-2. To get an
- impression of how this looks like, see the facsimile of this document:
-
-
-
- Division lines
-
Often, Pessoa draws lines on his documents to
- mark divisions between different parts of his notes. Such division
- lines are encoded using the element metamark, as in the
- following example, where a line is drawn between a list and the next
- heading:
-
[...] [...]
- “Fragmentos”.
-
-
Vida e obras do engenheiro
- Alvaro de
- Campos. [...]
-
"Metamark" means that this mark servers as a guide to the structure
- of the document, to how it should be read
- (for example, in which order). The attribute rend is used
- here to indicate the style of the mark, in this case a
- line. Also the function of the mark is encoded in the
- attribute function. In this case the function is to
- indicate that a new, different section of the document begins, so the value of the attribute is
- distinct.
-
-
- Division space
-
Instead of division lines, sometimes there is just additional space
- on the documents to mark the difference between one list and the
- next, or between different items of a list. In the following
- facsimile of the list BNP 12-1 10r, there is a list about Antonio
- Móra consisting of three parts. The first part has three list
- items, then two other items follow separated by space from the first
- part of the list:
-
-
To encode such spaces, the element metamark is used with the
- attribute rend having the value space. In the
- above example, the function of the space is to signal that the items
- are distinguished from each other, so the metamark gets the
- additional attribute function with the value
- distinct:
- Dissertação sobre a arte
- moderna.Prolegómenos a uma
- reformaçãodo paganismo.
-
Such metamarks may be added between lists, or between list items. In
- the above example, the mark is added inside of the list between the
- individual items.
-
-
- Lines as placeholders
-
In the documents, sometimes Pessoa uses lines as placeholders for
- some text that he maybe wished to add later. In the following
- facsimile of the document BNP 87 68r, the third list item of he
- second list on the page begins with a line, followed by the text
- "(some new collaborator)".
-
-
Here, the line clearly stands for some name to be added later. This
- is encoded as follows:
-
- (some new collaborator)
-
The line is marked up with the element metamark and the
- attribute rend with the value line. It has
- another attribute function with the value
- placeholder. In this example, the editor decided to
- explain that some text was omitted here, so the placeholder line is
- interpreted as an abbreviation standing for some other text. It is
- therefore enclosed in a construction of choice with the
- child elements abbr (containing the line mark) and
- expan (containing the supposed expansion). But, as the
- text that the placeholder stands for is not known, expan
- contains an element supplied with reason
- omitted-in-original.
-
Lines can also serve as placeholders for text that was already
- mentioned before. In the following example, a line is used to signal
- that text from the preceding list item is repeated:
- Traducção de Alberto Caeiro
-
-
Here, the line has the function of "ditto". Is is encoded with the
- element metamark, carrying the attribute rend
- with the value line and the attribute function
- with the value ditto. In this context, the line serves as
- an abbreviation, which is expanded to the text that it represents.
- For more details about this example, see the section on abbreviations below.
-
-
- Space as placeholder
-
Like lines, also space can serve as a placeholder either for some
- text that Pessoa wished to add later, or in the function of "ditto",
- repeating some text that was given earlier.
-
An example of the first case is visible in the facsimile of the list
- BNP 87 40r:
-
-
Here, the first three list items were entirely left blank and the
- fourth was left blank in the beginning. Such items are encoded as
- follows:
- — Angelo de Lima,
- (in fine)
-
-
The space is marked up with the element metamark and the
- attribute rend with the value space, as well
- as the attribute function with the value
- placeholder. It is interpreted as an abbreviation for
- something else, so it is surrounded by an element abbr.
- This is expanded inside of an element expan, which
- containts an element supplied, saying with the attribute
- reason and the value omitted-in-original,
- that the editor thinks that some text is missing here. The
- responsibility of this interpretation is given in the attribute
- resp which takes the initials of the editor as value.
- Finally, both abbr and expan are contained inside
- of an element choice, indicating that these two encodings
- are alternative views on the document, a more documentary one
- marking the space and a more interpretive one saying that something
- was omitted.
-
An example of the second case, space serving as "ditto", can be seen
- in the facsimile of the list BNP 48-56r:
-
-
Here the names initiating list items are only given the first time,
- e. g. "Robert Browning : Eveln Hope." From the second time on, there
- is just a space, which is thought to be filled with the same name.
- This is encoded as follows:
- Robert
- Browning
-
The element choice is used to mark that either the blank
- space can be shown or the name that it stands for. The blank space
- is interpreted as an abbreviation and marked up with the element
- abbr, inside of which metamark is used to mark
- the space itself. The metamark element here has the
- attribute rend with the value space and the
- attribute function with the value ditto. The
- expansion is then used to fill in the text that the space stands
- for, in this case the name "Robert Browning". This is encoded in the
- element expan.
-
-
- Curly brackets
-
Curly brackets are often part of notes added to the margin by Pessoa.
- They are encoded using the element metamark with
- rend having the value curly-bracket.
-
-
For an explanation of a complete example of a margin note, see the
- section on Notes added on the
- margin.
-
-
- Crosses
-
Pessoa uses crosses to mark that he is uncertain or has doubts about
- a passage of text on a document. An example is shown in the
- following facsimile from the list BNP 48, 18 and 19:
-
-
After the name "Alfredo Guisado" (the fith list item from the bottom)
- there is text in parentheses which is marked with a cross to the
- right: "(baloiço que me baloiça ?)+" This cross is
- interpreted as marking that Pessoa is not sure about the text
- preceding it in parentheses. This is marked up as follows:
- Alfredo
- Guisado:(baloiço que me
- baloiça ?)
-
To mark the uncertain passage, the element seg is used with
- the attribute type having the value certainty.
- The degree of certainty is indicated in the attribute
- cert and can be high, medium,
- or low. That Pessoa was the one having doubts is
- indicated with the attribute resp with the value
- FP. Here the cross is not included in the
- transcription anymore, because the uncertainty is indicated with the
- TEI element and the attribute rend is used to mark how it
- was rendered originally (here with the value cross right
- to say that the passage was marked with a cross on the right side;
- another possible value would be cross left).
-
-
- Arrows
-
On some lists, Pessoa uses arrows to connect different passages of
- text, or to show that some text is moved somewhere else. An example
- can be seen in the following facsimile of the list BNP 136-57v:
-
-
The last part of this document contains an arrow pointing from the
- heading "The New Decadence" to "or Being an apology of all culture
- not genuine". This is encoded as follows:
-
- The New
- Decadence
-
An Introduction to the Study of
- Indifference.
-
-
or
- Being an apology forof all culture
- not genuine.
-
-
The arrow itself is encoded as an element metamark with the
- attribute rend having the value arrow-down and
- the attribute function having the value
- assignment, because the arrow serves to assign the
- text to something else. Other renditions of arrows are possible:
- arrow-up, arrow-left,
- arrow-right, arrow-left-down,
- arrow-left-up, arrow-right-down,
- arrow-right-up, arrow-left-curved-down,
- arrow-left-curved-up,
- arrow-right-curved-down,
- arrow-right-curved-up, depending on in which
- direction the arrow points (up, down, left, right, or a combination
- of these) and whether it is straight or curved. The attribute
- target points to an anchor somewhere else which marks
- the point the arrow points to and the value of this attribute is the
- identifier of that anchor, preceded by "#", in this case
- #A3. In this example, the anchor is defined at the
- beginning of another paragraph and is added before the text of that
- paragraph begins. It is encoded with the element anchor and
- has the attribute xml:id to define the identifier
- A3.
-
There can also be text on arrows, as in the following example:
-
-
In the lower part of the list, there is a deleted item "(Antonio
- Mora)" that has two arrows at the end, one pointing to another item
- below it and the other pointing up. The arrow that points up has the
- text "F Pessoa" on it. This is encoded using an element
- label inside the metamark for the arrow, as in
- the following example:
- (Antonio
- Mora).
-
Each of the arrows is encoded with an element metamark. One
- arrow has the attribute rend with the value
- arrow-right-curved-down, because it is an arrow that
- is curved and points downwards on the right side, and the other one
- has the value arrow-right-curved-up, as it is curved and
- points up on the right side. Both metamark elements have
- the attribute function with the value
- assignment because the arrows assign a list item to
- other places in the document. In the attributes target,
- the identifiers of the elements marking the goal of the arrows ("A1"
- and "A2") are given, preceded by the "#". These anchor
- elements are defined elsewhere, as in the previous example. Inside
- of the first metamark element, the text on the error is
- encoded in the element label. Here, the text is the name "F
- Pessoa", so the label contains a reference to a name and an
- abbreviation which is expanded.
-
-
- Vertical text
-
On some handwritten documents, the text of notes on the side or
- entire lists is turned around and appears in vertical form. An
- example is shown in the following facsimile of the list BNP
- 133F-36v:
-
-
At the top left of the document, there is a list rotated to the left.
- In the lower part of the document, a side note is attached to the
- second and third item of a list, which is also written as vertical
- text, rotated to the left. In TEI, this is encoded as follows:
- Ultimatum.
- Desnivelamento [...]
-
The list at the top of the document gets an attribute rend
- with the value rotate-left. The same attribute and
- attribute value are used for the margin note. There, the element
- label, which holds the text that the curly bracket
- points to, has the rend attribute with
- rotate-left.
The genetic encoding involves changes that Pessoa himself made to the
- documents, for example text that he added, changed or deleted later.
- Such changes are interpreted as belonging to a second temporal level.
- Just two levels are differentiated, a first version (level 1) and a
- final version (level 2).
-
- Additions
-
In general, additions are encoded using the element add. The
- following facsimile shows an example of an addition:
-
-
In the second list item, a note is added below the word "Arist". This
- is encoded in the following way:
- — Voto
- – Democracia – AristAristocracia
- (critica) como se passa de uma idéa de aristocracia a
- outra
-
-
-
The element add is used to mark up the text that is added,
- in this case "como se passa de uma idéa de aristocracia a
- outra". This element carries an attribute place
- indicating where the addition is positioned in relationship to the
- existing text, in this case below it. The attribute place
- may have the values above, below,
- after, or margin-left. The second
- attribute used on add is n. It serves to mark
- the level of the genetic encoding, here the second level
- (2) because the text was added to the list later.
-
There is a third important element of the encoding of this addition.
- In the example, the element seg (segment) with the
- attribute type and the value anchor is used to
- create an anchor point for the addition. This means that the
- addition relates to the point in the text where "Arist (critica)"
- occurs. It is important that the add element occurs inside
- if the anchor seg. Such an anchor segment is only needed
- when the addition is not placed in relationship to the whole item
- that it occurs in (and that already has an own XML structure), but
- only refers to a part of it. In this case, the whole item is
- "— Voto – Democracia – Arist (critica)", but
- the addition is only made to the latter part, and "Arist (critica)"
- did not have any own mark-up before, so the segment is added
- here.
-
- Notes added on the margin
-
A special case of addition are notes that Pessoa added on the
- margin of a document. Often, curly brackets are used to group
- items in a list and add a note to them. An example of this can
- be seen in the following facsimile taken from the document BNP
- 12-1 10r:
-
-
Here the first three items of the list have a note added on the
- right margin ("Tres Dissertações"). This note is
- encoded as follows:
- Dissertação sobre as
- revoluções.Dissertação a
- favôr da Allemanhae do seu procedimento na
- guerra presente. Dissertação sobre a
- arte moderna.
-
For the note itself, the element note is used. It has
- several attributes. First, the note is of the type
- addition. Second, it has the attribute
- place, indicating where the note was added, in
- this case on the right margin, so it has the value
- margin-right. Other possible values for
- place of note are
- margin-left, below, top, and
- center. Third, the note carries the attribute
- n with the value 2, indicating that
- the note is part of the last version of the document because it
- is interpreted as having been added later by Pessoa. Fourth, the
- note has an attribute target. This serves to explain
- to which items of the list the note is added. In this example,
- the note is added to the three first items in the list. To be
- able to address these items formally, they need an identifier in
- the XML. Therefore, the three list items each carry an attribute
- xml:id with unique values, here I1,
- I2 and I3. Then the target
- of the note can use these identifiers and point to
- them. The value of the note's target is
- range(I1,I3), which means that the note points
- from the first item to the third item. The element note
- is here place inside of the second list item, after the text of
- the list item. The best way to place the note is in the middle
- of the range of items it points to. Because it points to item
- 1-3 here, item 2 is a good place to position the note
- element. The note itself is further encoded inside of the
- note element. Here, an element metamark is
- added to represent the curly bracket. It has two attributes:
- rend with the value curly-bracket and
- function with the value grouping,
- because the bracket serves to group the three list items.
- Finally, the curly bracket has a "label", which is the text of
- the note. This is encoded inside of an element label.
- The text is added here, and the line breaks occurring in the
- text are also marked with empty lb elements. Also,
- there is an hyphen dividing the word "Dissertações",
- which is encoded with the element pc (for punctuation
- character).
-
Another example of a note added to the margin is visible in the
- following list BNP 133M-30r:
-
-
At the top of the document, two words are written on the right
- side of the typed list. Both are struck through. This margin
- note is encoded as follows:
- Commercial
- Code.
-
-
The note is interpreted as belonging to the first list item, so
- an element note is added at the end of this first list
- item. The note element carries the attribute
- place to say that the note is added on the right
- margin of the list (margin-right) and the attribute
- n with the value 2 to say that the
- note was added to the list later and is interpreted as belonging
- to the second edited version of this document, but not to the
- first one. The text of the note itself is transcribed and
- encoded inside of the note element, just that in this
- case, the words could not be read by the editor. They are
- therefore marked up as two gaps, each with
- reason
- illegible, unit
- word, and extent
- 1. The words are separated by a line break
- (lb) and are both deleted, which is marked-up with
- the element del and the attribute rend with
- the value overstrike.
-
-
- Additions of longer passages of text
-
In some cases it is not just one or several characters or words
- that is added to a list, but more text, for example several new
- list items or a whole list. In those cases, the element
- add is impractical because it cannot contain
- structures such as several list items or a whole list, so
- another solution is needed for the mark-up of such additions. An
- example can be seen in the following list BNP 133M-30r:
-
-
On this document, a handwritten list ("1. System of Shorthand. 2.
- Look for door...") is added to a typed list which was present
- first. This is encoded as follows:
- System of
- Shorthand. Look for door
- — in instead of out.
-
Before the list that is to be added, an element addSpan
- is used. This is an empty element (it has no opening and closing
- tag, but just one tag which closes directly with />). It is
- just to mark the beginning of the text span to be added. The
- attribute n with the value 2 says that the
- text that follows is to be added to the second edited version of
- this document, but is not present in the first version. The
- other attribute spanTo serves to indicate where the
- added text ends. The value of this attribute is a pointer to the
- identifier of another element. The "#" means that this attribute
- points to something else and the "A1" is the identifier pointed
- to. This identifier is defined on the element anchor,
- which is used to mark the end of the stretch of text to be
- added. In this case, the anchor element occurs after
- the list to be added. It carries the attribute xml:id
- with the value A1.
-
-
-
- Substitutions
-
There are two kinds of substitutions. In the first case, something is
- deleted and replaced with something else. In the second case, an
- alternative is added without deleting the first option.
-
An example of the first case (something is deleted and replaced) can
- be seen in the following facsimile of the list BNP 143 6r:
-
-
In the fourth item of the list, the word "large" is overtyped and
- replaced with the word "big", which is put above the old word. This
- list item is encoded in the following way:
- Biomancy (fairly largebig
- article)
-
The word that is deleted ("large") is marked up with the element
- del. It has the attribute rend with the
- value overtyped, because the word is deleted by typing
- some "xxx" over it. The attribute n with the value
- 1 says that the word "large" belongs to the first
- version of this document (before it was deleted). The word that is
- added instead ("big") is encoded with the element add.
- Where the new word is added is indicated in the attribute
- place, which has the value above here.
- Also, the addition carries the attribute n with the value
- 2, saying that this addition belongs to the second,
- final version of the document. Both the deletion and the addition
- are surrounded by an element subst, indicating that this is
- a substitution.
-
Another example of substitutions can be seen in the facsimile of the
- list BNP 120-23r:
-
-
In the fourth list item, there are two kinds of substitutions. The
- first one is that the letter "A" is overwritten with the letter "O",
- and the second one that the word "Agua" is struck through and the
- word "Segredo" added above it to replace it. This list item is
- encoded in the following way:
- AOAguaSegredo de Tse-i-la.
-
Both substitutions are marked up with the element subst
- containing an element del for the deleted part and and
- element add for the added part. In both cases, the deleted
- words are marked as belonging to the first edited version of the
- document (n with 1) while the added words are
- part of the second version (n with 2). The
- first deletion is rendered as rend
- overwritten and the second as overstrike. In
- the first substitution, the new letter is added directly on top of
- the deleted one, so the element add needs no additional
- attribute place saying where the addition was made. In
- the second substitution, the addition was made above the previous
- word, so add has an attribute place with the
- value above.
-
An example of the second case (something is replaced without deleting
- the first option) can be seen in the following facsimile:
-
-
Here, in the list item with the number 6, there is the title "Le
- Gardien des Troupeaux", to which an alternative is added resulting
- in "Le Gardien de Troupeaux". The encoding of this alternative is
- shown in the following:
- Traducção de Alberto
- Caeiro
- – Le Gardien dese
- Troupeaux
-
Because the change is actually only applied to the word "des", the
- element choice is only used on this word, more specifically
- on its last two characters "es", which are changed just to "e". The
- element choice contains two segments, seg 1 and
- seg 2, one for each version of the word. The first
- version is marked with n = 1 and the last
- version with n = 2. Furthermore, the last version is
- encoded as an addition using add inside of the second
- segment. Also, the place of the addition is indicated in
- place, which has the value above.
-
-
- Transpositions
-
A transposition means that a passage of text should be moved to
- another position, but the result of this process is not visible in
- the document. Instead, some metamark (e.g. an arrow, a line, or
- numbers) indicates which elements should be transposed (see the TEI guidelines for more information). In this edition,
- the metamark indicating the transposition is included in the
- diplomatic transcription. In the first edited version of the text,
- the passages are shown as they were originally and in the second
- edited version, the result of the transposition is given. An example
- of a transposition can be found in the document BNP/E3 93-56r, as
- shown in the following facsimile:
-
-
On the lower part of the page, there is a list with five items. The
- second list item has three lines, of which the second one has a
- transposition. A line indicates that the two words "poemas bons"
- should be transposed to "bons poemas". In TEI, this is encoded as
- follows:
- os
- poemas
- bons de Edgar
- Poe.
-
The element metamark with the attribute function
- and its value transposition is used to represent the sign
- that indicates the transposition, in this case a line (so that the
- attribute rend has the value arrow). The
- attribute place indicates where the metamark is placed in
- relationship to the elements that should be transposed. Here the
- value above is given, although the line actually starts
- above the first word and ends below the second, so this is a
- simplification. As a rule of thumb, the place where the metamark
- starts should be indicated in the place attribute. The
- attribute target of the metamark element serves
- to point to the elements which should be transposed. The pointers
- are the values of the identifiers of those elements, preceded by the
- sign '#' and separated by a space. In this case, the two elements
- with the identifiers S1 and S2 should be transposed. The order of
- the two identifiers is the one that the elements have in their
- original position (so "poemas" = S1 before "bons" = S2). Finally,
- the attribute n with the value 2 means that
- the transposition should only be realized in the second version of
- the text. The two elements that should be transposed directly follow
- the metamark element. Here, these are to seg
- elements, one for each word, and they have the identifiers
- S1 and S2 as values of the attribute
- xml:id.
-
Another example of a transposition can be found in the document CP
- 786. In that case not two words are transposed but two rows of a
- table, as can be seen in the following facsimile:
-
-
Here the second and third rows with the text "Spell" and "Carta ao
- Author de Sachá" should be transposed. In TEI, this is encoded
- as follows:
- Spell—89(p. 150)Carta ao Author de
- Sachá—98(p. 93)
-
As in the case of the example with two words, also for the two table
- rows the element metamark with an attribute
- function and its value transposition holds
- the sign that indicates that the two table rows should be
- transposed. Also here, it is a curved line starting at the beginning
- of the word "Spell" and ending at "Author" on the next line. To
- simplify this, the attribute place of metamark
- has the value left, which means that the sign is placed
- on the left side of the table rows. What should be transposed is
- indicated in the attribute target, by giving the
- identifiers of the corresponding elements, in this case the
- identifiers of the two table rows (R1 and R2). The two row
- elements that should be transposed directly follow the
- metamark element.
-
-
- Deletions
-
Deletions can be sections of text that are visibly struck through or
- typed over by Pessoa. The following facsimile contains an example of
- a deletion:
-
-
On the right page, below the heading "Um grande poeta materialista
- (Alberto Caeiro)", there is a phrase "A enthusiastica all" which is
- struck through. This is encoded as follows:
- Um grande poeta materialista
- (Alberto
- Caeiro)
A enthusiasthica all
-
To mark deletions, the element del is used in combination
- with the attribute rend, which here has the value
- overstrike. Other possible values are
- overtyped (when the document is not handwritten but
- typed) and overwritten (when the text is overwritten with
- new text instead of using a line to strike it through). In the
- current example, the deletion is interpreted as already belonging to
- the first version of the document. It does therefore not have an
- attribute n with a value 2, which would mark
- that the deletion was made only for the final version of the
- document.
-
- Deletion of longer passages of text
-
In some cases not just a few characters or words are deleted, but
- longer passages of text, for example several list items or a
- whole list. This is the case in the list BNP 144A-37v, as can be
- seen in the following image:
-
-
Here, the whole list is deleted. This cannot be encoded with the
- element del because that element is not allowed to hold
- entire lists. The solution is shown in the following code
- example:
- Constantine Dix.Poe's
- Poems.
-
The element delSpan is used to mark the beginning of the
- passage that should be deleted. It is an empty element which
- closes directly. How the deletion should be rendered is
- indicated in the attribute rend, which has the value
- overstrike here. The element delSpan has
- another attribute spanTo which points to another
- element with the identifier "A2". That it is a pointer is marked
- with the sign "#", so the value of the attribute is
- #A2. This other element servers to mark the end
- of the deleted passage. It is encoded with the element
- anchor and has the attribute xml:id with
- the value A2. The anchor element is also
- empty.
-
-
-
- Other modifications
-
In some cases, it is not text that is added, but graphical elements.
- For example, words can be modified by underlining them or drawing a
- circle around them. An example is shown in the following facsimile
- of the list BNP 133M-30r:
-
-
On this list, the text "(Advertise for Cipher Agency - America)." is
- highlighted by a frame which Pessoa added later.
-
The TEI element add is not suitable to encode such
- modifications. Instead, the more general element mod is
- used, as in the following code snippet:
- (Advertise for Cipher Agency —
- America).
-
The element mod surrounds the text to be framed. The
- attribute rend indicates how it should be modified, here
- by adding a frame (framed). The attribute n
- with the value 2 indicates that the modification was only
- done later and should be part of the second edited version of the
- document.
-
-
-
- Editorial interventions
-
For some aspects of the text on the documents, the
- editor may decide to give more information on the transcribed text, for
- example to indicate how abbreviations would be expanded, that there is a
- gap in the text and how it could be filled. It should also be marked if
- the editor decides to only transcribe some part of the document, but not
- the whole one.
-
- Expansion of abbreviations
-
The following example shows how abbreviations are encoded in the documents and how they can be expanded:
- — 2
- idéas para o L. do
- Des.Livro do
- Desasocego
-
-
Here, there is a list item containing the title "Livro do Desasocego"
- in abbreviated form: "L. do Des.". To mark the difference between
- the abbreviated and expanded form, the element choice is
- used. Inside of it, first, the the abbreviated text is given in an
- element abbr. Inside of it, the text is transcribed as it
- appears on the document. The abbreviation
- signs, in this case dots, are marked up further with the element
- am ("abbreviation mark"). The expansion of the
- abbreviation is given in the element expan. Inside of this,
- the parts where abbreviation marks are replaced by text are given in
- elements ex. Otherwise the text of the abbreviation is
- repeated in the expansion, because the choice element says
- that just one of the versions will be displayed at a time.
-
- Ditto
-
A special case of abbreviation expansion are subsequent items in
- a list that contain repetitions for which Pessoa used
- typographical marks as placeholders.
-
In the following facsimile, it can be seen that the sixth list
- item starts with a line:
-
-
This line indicates that the beginning of this list item
- corresponds with the beginning of the previous item number 5, i.
- e. "Trad. de...". The line is therefore interpreted as an
- abbreviation, which can be expanded to the text of the preceding
- list item. This is encoded as follows:
- Trad.
- Traducção
- de A.
- C.Alberto
- Caeiro – The Keeper of
- SheepTraducção de Alberto
- Caeiro
- – Le Gardien dese
- Troupeaux
-
The line itself is encoded with the element metamark,
- using the attribute rend with the value
- line and the attribute function with
- the value ditto. The line is then marked as an
- abbreviation using abbr. It is expanded by using the
- element choice wrapped around the abbreviation and
- adding the element expan to include the text and
- mark-up that the line stands for: "Traducção de
- Alberto Caeiro".
-
Another example for "ditto" using quotation marks instead of
- lines is visible in the following page of the list BNP
- 133M-96-a-98:
-
-
In item 14, there are two subitems: 'Small book on Sh. - Bacon'
- and 'Larger " " " "', where the quotation marks are placeholders
- for the text of the previous item. This is encoded as
- follows:
- Small book on ShShakespeare
- – Bacon.Larger
- book on Shakespeare –
- Bacon
-
The second item of the sublist contains an element
- choice with the child elements abbr and
- expan. The abbreviation holds the quotation marks
- with the function "ditto". Each quotatation mark is encoded as
- and metamark with the attributes rend with
- the value quotes and function with the
- value ditto. The expansion in the element
- expan then contains the repeated text that the
- quotation marks stand for.
-
-
-
- Selections
-
In some cases not the whole content of a document is relevant for the
- edition, but only a certain part of it. Then the element
- gap can be used to mark such selections:
- [...] Tentar artigo no Mercure de France, sobre
- Alberto
- Caeiro
- .
-
-
In the example, the list is transcribed up to the seventh item. On
- the document, there is more text below the list, but it was decided
- not to transcribe it. The element gap indicates that
- something was left out here. In the attribute reason, it
- is mentioned that the gap is due to selection (and not,
- for example, because the document is damaged or the text illegible).
- Also the extent of what was not transcribed should be indicated.
- This can be done using the attribute unit in combination
- with the attribute extent. The first one says what is
- counted and the latter how much of it was selected. In the example,
- the remaining lines were counted and three lines were not
- transcribed. Possible values for unit are
- character, word, and line.
- Possible values for extent are numbers.
-
-
- Conjectural readings
-
Sometimes the editor is not sure how a passage should be read, but
- still wants to make a suggestion. Such conjectural readings are
- marked with the element unclear, as in the following
- example:
- - Bacon
- Books
-
Here the word "Books" could not be read with certainty. The attribute
- reason serves to explain why something was unclear,
- in this case because the word was illegible.
-
-
- Gaps
-
One sort of gaps is when some text in a document is present, but
- could not be read by the editor. See for example the following
- facsimile of the document 144D2 9r:
-
-
At the end of the fourth list item, there is some text in
- parenthesis, beginning with "Fraça, Barrès.", but the
- third word, which was struck through, could not be read. It is
- therefore marked as a gap and encoded as follows:
- O scepticismo
- energico. (França, Barrès. )
-
The word that could not be read is not transcribed. Instead an
- element gap is added at the position of the illegible word.
- The gap element gets an attribute reason stating
- why there is a gap, in this case because the word is
- illegible. What and how much is illegible is
- indicated in the other two attributes: unit stating that
- it is a word and extent stating that just
- 1 word could not be read. In this specific example,
- the illegible word is also struck through. This is marked up by
- adding a del element around the gap element, with
- an attribute rend with the value
- overstrike.
-
Another kind of gap is, when the editor may wish to indicate that at
- some points in a document, some text is expected but is missing
- because the document was not finished or because the text was left
- out on purpose. An example of such a case is shown in the
- following:
- O
- Mundo - ver se se obtem Santos-Vieira. (pelo lado anti-clerical )
-
-
-
Here there is a list item containing the name of a periodical and a
- comment "ver se se obtem Santos-Vieira". An addition is made below
- this list item: "(pelo lado anti-clerical )". Because there is space
- between the last word of the addition and the closing parenthesis,
- the editor assumes that there should be more text. To mark this, the
- element supplied is used. It carries two attributes, the
- first one, resp, serves to indicate who made this
- intervention. As a value, it takes the initials of the responsible
- editor, in this case "PS" for "Pedro Sepúlveda". The second
- attribute is reason, explaining why something is
- supplied. Here it has the value omitted-in-original. In
- this example, the element supplied is empty because the
- editor does not know what the missing text is. In other cases, it is
- possible that the element supplied contains the text that
- is supposed to be there.
-
In another case, there is a list starting with a heading "Italian:",
- but no list items are added on the document. This is encoded by
- using the element supplied inside of an otherwise empty
- list item, as in the following encoding example:
- Italian:
-
-
-
-
- Encoding of publications
-
- General structure
-
The transcription of publications is encoded within the TEI elements
- text and body, each appearing twice. In the first
- text element, you'll find the attributes corresp
- and type. In the second text, only the attribute
- type is present:
-
-
- [...]
-
-
- [...]
-
-
-
The attribute corresp contains a unique key-value defined in
- an external central work register (see the section on Work index for more details). However, in
- the attribute type, you will consistently find the value
- "orig". This indicates that the first text
- element contains a transcription of the spelling and formatting that
- follows the source of the publication and has not been normalized or
- corrected. In contrast, the second text element contains the
- attribute type, but with the value "reg". This
- signifies that it represents the current spelling of the published
- text.
-
-
- Structures inside of divisions:
- headings, paragraphs, ...
-
Similar to document encoding, there are other structures within the main
- sections of a puplication, such as headings, paragraphs, or verse lines
- of verse that are encoded.
-
- Headings
-
-
-
- MAR PORTUGUEZ
-
- I
- O INFANTE
- [...]
- [...]
- [...]
-
-
-
-
-
- Paragraphs and other blocks
-
-
-
-
-
A quadra é o vaso de flores que o povo põe á janela da sua
- Alma.
-
Da orbita triste do vaso obscuro a graça
- exilada das flôres atreve o seu olhar de alegria.
-
Quem faz quadras portuguezas comunga a
- alma do Povo, humildemente de nós todos e errante dentro de
- si propria.
-
Os autores d'este livro realizaram as
- suas quadras com destreza luzitana e fidelidade ao
- instinctivo e desatado da alma popular.
-
Elogial-os mais seria elogial-os
- menos.
-
17-IV-1914
[...]
-
-
-
-
-
- E a orla branca foi de ilha em continente,
- Clareou, correndo, até ao fim do mundo,
- E viu-se a terra inteira, de repente,
- Surgir, redonda, do azul profundo.
-
-
- Quem te sagrou creou-te portuguez.
- Do mar e nós em ti nos deu signal.
- Cumpriu-se o Mar, e o Imperio se desfez.
- Senhor, falta cumprir-se Portugal!
-
-
-
-
-
-
- Fernando
- Pessoa .
-
-
-
-
-
-
- Encoding of references (names, titles, periodicals, works, ...)
-
In the edition, references to several kinds of entities are encoded: to
- names, titles, periodicals, works, and collections. For all of these
- references, the element rs is used. In the attribute
- type, the kind of reference is given. This attribute can have
- the values name, title, periodical,
- work, or collection. The following example shows a
- heading of a document that is at the same time a
- reference to a title, which itself is the title of a collection of works:
- Na Casa
- de saude de Cascaes [...]
-
Therefore, the text "Na Casa de saude de Cascaes" is wrapped with two
- rs elements, one to say that it is a reference to a title and
- the other to state that the title is a reference to a collection of works.
- In both cases, also the key attribute is used. It serves to
- identify the entity which is referenced. Each type of entity has an own type
- of key. Titles, for example, have keys beginning with "T", followed by a
- number. Collections have a key beginning with "C". Names have keys beginning
- with "P" (for person name), periodicals with "J" (for journal) and works
- with "W". Possible values for the keys are to be found in external lists of
- the entities. A special case are references to the main heteronyms. Although
- these are references to names, the keys do not begin with "P" in these cases
- (as for all other person names), but specific keys for the heteronyms are
- used ("FP", "AC", "AdC", "RR", "BS").
-
In the next example, a reference to a work title is given:
- Alberto
- Caeiro
-
Here, the title is a name of a heteronym ("Alberto Caeiro"), used as a
- placeholder for the work of this heteronym. First the reference to the name
- is marked using the element rs with the attribute type
- and the value name. The key for Alberto Caeiro is
- AC.
-
Also, it is important to note that references to the main heteronyms
- (Fernando Pessoa, Alberto Caeiro, Álvaro de Campos, Ricardo Reis, and
- Bernardo Soares) are further marked up by indicating the role that they have
- in the reference. In the example, Alberto Caeiro is mentioned as an author,
- so the attribute role is used on rs with the value
- author. The available values for role are:
- author, editor, translator, and
- topic.
-
Next, the reference to the title is marked with rs and
- type with the value title and a key
- giving the title idenfier T48. Then another element rs
- is used around the first one to indicate that this is a work reference (with
- type
- work) and to which work (with key
- W32).
-
- Roles of name references
-
References to the main heteronyms (Fernando Pessoa, Alberto Caeiro,
- Álvaro de Campos, Ricardo Reis, and Bernardo Soares) are further
- marked up by indicating the role that they have in the reference. For
- this purpose, the attribute role is used on rs. The
- available values for role are: author,
- editor, translator, and topic. Most
- often, names are mentioned as authors, but there are also cases where a
- name is mentioned as part of a topic, as in the following example:
- Vida e obras do engenheiro
- Alvaro de
- Campos.
-
Here, Álvaro de Campos occurs inside of a title reference and is the
- topic of the work.
-
In the next example, a heteronym is mentioned in the role of editor:
- Livro do
- Desassocego.escripto por Vicente Guedes, publicado por Fernando
- Pessoa.
-
Here Fernando Pessoa is mentioned as the editor of "Livro do Desassocego"
- and the reference to his name is therefore marked with role
- editor. As the author, Vicente Guedes is mentioned, but as
- this is not one of the main heteronyms, no role is indicated in the name
- reference in that case.
-
-
- Styles of name references
-
Another aspect of the encoding of rerences is the "style" of the
- reference. "Style" means that a certain way of spelling a reference is
- used. In the following example, the name "Antonio Mora" is given without
- any accent. This is marked as "style b", using the attribute
- style on the element rs, giving it the value
- b. In the case of Antonio Mora, the style a is "Antonio
- Móra" with an accent.
- “Prolegomenos” de Antonio Mora.
-
For which names there are different styles available is defined in the
- external list of names. If a name has different styles, the attribute
- style should always be used in the encoding to indicate
- which of the styles is used in the document that
- is transcribed.
-
-
-
- Encoding of links
-
Besides references to different kinds of defined entities (such as persons,
- journals, etc.) also general links can be added to the transcriptions of
- documents and publications. Such links can serve to interconnect different
- parts of the edition, without the necessity to explicitly define the kind of
- relationship between the source and the target(s) of the link. They can also
- be used to point to external resources. That way, the links are a means of
- interpretation and comment on the transcriptions made by the editors.
-
Examples of the encoding of links are given below, taken from the editorial
- list BNP/E3 144X-48v:
- Artigo em A
- Galera, Coimbra (Antonio Nobre).
-
In the seventh item of the list, an article to be published in the journal "A
- Galera" is mentioned. The mention of this article is linked to the text
- "Para a memoria de Antonio Nobre", which Pessoa published in "A Galera" in
- 1915, and which is also part of the digital edition. That way, the link
- implies that the published text is a realization of the planned article that
- was mentioned in the editorial list. A link is encoded using the element
- ref, which surrounds the text that carries the link and the
- attribute target on the ref element, which contains the
- target of the link in the form of a URI. In the above example the link has
- only one target, but it is also possible that several targets are defined at
- the same time, as the following example shows:
- Artigos em O
- Jornal. Abril 1915.
-
Here, the sixth item of the editorial list mentions articles ("artigos") to
- be published in "O Jornal". The link surrounding this mention has several
- targets that correspond to various articles published in the journal in
- question, which are included in the digital edition. Several targets are
- given as several URIs in the target attribute, separated by a
- space.
-
-
-
-
-
- TEI Specifications
-
This TEI Customization
uses the modules core, tei, header, textstructure,
- msdescription, transcr, analysis, linking, figures and certainty.