Skip to content

WMU-Herculaneum-Project/maat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine-Actionable Ancient Text Corpus (MAAT)

DOI

This is code to create the Machine-Actionable Ancient Text Corpus (MAAT).

Installation

This code uses the poetry system to install Python dependencies. To install, clone this repository, and then:

$ poetry install
$ poetry shell

Use

First, you need to install the TEI XML files that you want to convert to MAAT.

The first corpora added to the MAAT corpus are the following:

  1. The Duke Databank of Documentary Papyri data can be cloned from here.
  2. The Digital Corpus of Literary Papyri data can be cloned from here.
  3. Epigraphic Database Heidelberg data can be downloaded from here.

The script/convert script is used to convert the TEI XML files to MAAT. The script takes one or more directories as input, and writes the MAAT JSON-LD to standard output.

This is how the first files were converted:

$ script/convert /Volumes/general/corpora/papyri/idp.data/DDB_EpiDoc_XML  /Volumes/general/corpora/papyri/idp.data/DCLP /Volumes/general/corpora/inscriptions | tee /tmp/results.json | jq .id

This places the results in /tmp/results.json, and prints the id field of each document to standard output.

A log file is also created in the current directory, with the name convert_errors.json. This file contains the errors that occurred during the conversion process. It is also in JSON-LD format.

Citation

If you use this code, please cite the following:

@software{maat,
  author = {Fitzgerald, Will AND Barney, Justin},
  title = {Machine-Actionable Ancient Text Corpus (MAAT)},
  url = {https://github.com/WMU-Herculaneum-Project/maat},
  version = {1.0.0},
  date = {2024-07-15},
}