Skip to content

Analyzing linguistic data from the Hebrew Bible in conjunction with Gesenius

Notifications You must be signed in to change notification settings

CambridgeSemiticsLab/Gesenius_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gesenius Data

Analyzing linguistic data from the Hebrew Bible in conjunction with Gesenius. The purpose of the repository is to produce datasets and analysis for the Oxford Grammar of Biblical Hebrew project lead by Professor Geoffrey Khan.

Directory Structure and Data

Analysis and data-production pipeline occurs in workflow and is output to results.

This project uses Snakemake for the pipeline.

License

All data which is crucial for the analysis is stored openly (MIT license) in results/csv. All of the Hebrew Bible data is derived from the ETCBC's BHSA, which is undera CC-BY-NC 4.0 license. The English alignment data and the translation themselves are not themselves open-source and thus cannot be released. However, we do release the data derived from those sources secondarily. For English alignments, we provide a link between a BHSA node from Text-Fabric and a given tense-tagging, which has been composed using a mixture of Spacy and manually input rules. Thus we cannot provide the English string, e.g. ברא = "he created" (ESV), but we can provide the tense tag which is the only thing that is crucial for the analysis anyways, e.g. ברא = "simple past" (ESV). The LXX data is under its own license, coming from the CATSS project. We only provide a small subset here. The full dataset can be found at http://ccat.sas.upenn.edu/gopher/text/religion/biblical/parallel/.

As with any academic work, please cite when using this repository.

About

Analyzing linguistic data from the Hebrew Bible in conjunction with Gesenius

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages