This is code to create the Machine-Actionable Ancient Text Corpus (MAAT).
This code uses the poetry
system to install Python dependencies. To install, clone this
repository, and then:
$ poetry install
$ poetry shell
First, you need to install the TEI XML files that you want to convert to MAAT.
The first corpora added to the MAAT corpus are the following:
- The Duke Databank of Documentary Papyri data can be cloned from here.
- The Digital Corpus of Literary Papyri data can be cloned from here.
- Epigraphic Database Heidelberg data can be downloaded from here.
The script/convert
script is used to convert the TEI XML files to MAAT. The script takes one or more directories as input, and writes the MAAT JSON-LD to standard output.
This is how the first files were converted:
$ script/convert /Volumes/general/corpora/papyri/idp.data/DDB_EpiDoc_XML /Volumes/general/corpora/papyri/idp.data/DCLP /Volumes/general/corpora/inscriptions | tee /tmp/results.json | jq .id
This places the results in /tmp/results.json
, and prints the id
field of each document to standard output.
A log file is also created in the current directory, with the name convert_errors.json
. This file contains the errors that occurred during the conversion process. It is also in JSON-LD format.
If you use this code, please cite the following:
@software{maat,
author = {Fitzgerald, Will AND Barney, Justin},
title = {Machine-Actionable Ancient Text Corpus (MAAT)},
url = {https://github.com/WMU-Herculaneum-Project/maat},
version = {1.0.0},
date = {2024-07-15},
}