-
Notifications
You must be signed in to change notification settings - Fork 22
/
bhsa-clariah-ineo.yml
207 lines (188 loc) · 8.38 KB
/
bhsa-clariah-ineo.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
intro: >-
This is the text-fabric representation of the Hebrew Bible Database,
containing the text of the Hebrew Bible augmented with linguistic annotations.
properties:
access:
- link: https://creativecommons.org/licenses/by-nc/4.0/
title: CC-BY-NC
community:
- title: >-
The Slack community in etcbc-vu has a high question-answering and
problem solving potential. If you need an invite, ask for it who is
already part of it, and if you do not know one, ask one the contact
persons
development:
- link: https://dans.knaw.nl/en/
title: DANS
- link: https://di.huc.knaw.nl
title: KNAW Humanities Cluster - Digital Infrastructure
- link: http://etcbc.nl/
title: ETCBC
- title: >-
Eep Talstra, Constantijn Sikkel, Willem van Peursen, Dirk Roorda, Cody Kingham, Martijn Naaijer
generalContact:
- link: http://etcbc.nl/contact/
title: ETCBC Contact
informationTypes:
- '1'
intro: Biblia Hebraica Stuttgartensia Amstelodamensis
languages:
- Hebrew
- Aramaic
- English
learn:
- label: >-
There is an extensive set of tutorials for working with the BHSA by
means of Text-Fabric.
link: https://github.com/ETCBC/bhsa/tree/master/tutorial
title: Repository
- link: >-
https://nbviewer.jupyter.org/github/ETCBC/bhsa/blob/master/tutorial/start.ipynb
title: Entry point
link: https://github.com/ETCBC/bhsa/
mediaTypes:
- 'text '
problemContact:
- link: https://pure.knaw.nl/portal/nl/persons/dirk-roorda
title: Dr. Dirk Roorda
programmingLanguages:
- link: https://www.python.org
title: Python 3.6
researchActivities:
- '1'
- '1.1'
- 1.1.4
- 1.1.7
- 1.7.1
- 2.1.4
- 2.4.1
- '5.1'
- '6'
researchContact:
- link: https://research.vu.nl/en/persons/eep-talstra
title: Prof. dr. Eep Talstra
- link:
title: Prof. dr. Willem van Peursen
researchDomains:
- '11.15'
- '11.17'
- '19.3'
resourceHost:
- link: https://ETCBC.github.io/bhsa/
title: ETCBC Github
resourceOwner:
- link: http://etcbc.nl/
title: ETCBC
resourceTypes:
- Data
sourceCodeLocation:
- link: https://github.com/ETCBC/bhsa/
standards:
- link: https://pypi.org/project/text-fabric/
title: 'Text-Fabric '
status:
- Active
versions:
- link: https://github.com/ETCBC/bhsa/releases/tag/v1.7.3
title: 1.7.3
relatedProjects:
- 'LinkSyr: Linking Syriac Data'
relatedResources:
- This resource is not (yet) available
slug: bhsa
tabs:
learn:
body: >-
## Learn
Different ways to explore this dataset are supported.
* Using the website SHEBANQ for users that
do not want to use the resource programmatically:
you can execute linguistic queries and save and publish them.
![](https://cdn.sanity.io/images/0v602vuh/production/be69557154a0a694960f71b4045fd6673b2a694e-3120x3364.png?auto=format&fit=crop&dpr=1&fit=fill&q=80&w=1400)
* Use the Text-Fabric browser. You need Python,
but you do not have to program in it.
You can execute queries in your browser, served by a local webserver.
![](https://cdn.sanity.io/images/0v602vuh/production/d959213a1276b09c9eddfdb03302f353c8f7a8e2-3154x2698.png?auto=format&fit=crop&dpr=1&fit=fill&q=80&w=700)
* Use Text-Fabric as a library. You need to program in Python.
You can build data workflows, and you can write exploratory Jupyter notebooks,
by which you have ultimate control over the data,
and powerful methods to render parts of the corpus in rich displays.
![](https://cdn.sanity.io/images/0v602vuh/production/fbd4a1c6fe6396280a742e9146d2b21c6160eee9-2264x3398.png?auto=format&fit=crop&dpr=1&fit=fill&q=80&w=700)
* Text-Fabric is on the [Python Package Index](https://pypi.org/project/text-fabric/)
and can be installed by means of pip.
Once Text-Fabric is installed, it will fetch a working copy of the data to your computer
when it needs it.
You can also obtain the data directly from [GitHub](https://github.com/ETCBC/bhsa/).
* There is an extensive set of tutorials for working with the BHSA by means of Text-Fabric:
[in the repo](https://github.com/ETCBC/bhsa/tree/master/tutorial) or via
[nbviewer](https://nbviewer.jupyter.org/github/ETCBC/bhsa/blob/master/tutorial/start.ipynb).
mentions:
body: >-
## Publications
* [Coding the Hebrew Bible](https://doi.org/10.1163/24523666-01000011)
* [The Hebrew Bible as Data: Laboratory – Sharing – Experiences](https://doi.org/10.5334/bbi.18 ).
CLARIN in the Low Countries, Ch. 18.
overview:
body: >-
## Overview
* This
[text-fabric](https://annotation.github.io/text-fabric/tf)
representation of the Hebrew Bible Database contains the text of the Hebrew Bible
augmented with linguistic annotations compiled by the
[Eep Talstra Centre for Bible and Computer](http://etcbc.nl/),
VU University Amsterdam.
* The text is based on the
[Biblia Hebraica Stuttgartensia](https://www.academic-bible.com/en/online-bibles/biblia-hebraica-stuttgartensia-bhs/read-the-bible-text/)
edited by Karl Elliger and Wilhelm Rudolph,
Fifth Revised Edition, edited by Adrian Schenker,
© 1977 and 1997 Deutsche Bibelgesellschaft, Stuttgart.
* The [text-fabric](https://annotation.github.io/text-fabric/tf)
version has been prepared by Dirk Roorda,
[Data Archiving and Networked Services](https://dans.knaw.nl/nl),
now
[KNAW Humanities Cluster](https://di.huc.knaw.nl),
with support from Martijn Naaijer, Cody Kingham, and Constantijn Sikkel.
* The data is available in more formats.
In the SHEBANQ subdirectory you find data in MQL format and in MYSQL format
that directly goes into the
[SHEBANQ website](https://shebanq.ancient-data.org/).
* In the
[bigTables](https://github.com/ETCBC/bhsa/blob/master/programs/bigTables.ipynb)
you find ways to export the complete data as one big table, and store it
in R format or in Pandas format.
The notebooks
[bigTablesP](https://github.com/ETCBC/bhsa/blob/master/programs/bigTablesP.ipynb)
and
[bigTablesR](https://github.com/ETCBC/bhsa/blob/master/programs/bigTablesR.ipynb)
show you a few things that you can do in R and Pandas.
bodyMore: >-
This dataset contains a precise transcription of the Codex Leningradensis.
It follows the Biblia Hebraica Stuttgartensia. The text is augmented with
linguistic annotations, from lemmatization and morphology, to syntax and
discourse structures.
All this data is represented in such a way that you can compute with it.
Text and annotations are transparently encoded in plain text files. The
Python library Text-Fabric offers a browsing/searching/computing interface
to this data. The website https://shebanq.ancient-data.org is based on the
very same data. Text-Fabric also supports the publishing of your own
results so that others can use it alongside the main dataset.
The data is licensed by the
[CC-BY-NC license](https://creativecommons.org/licenses/by-nc/4.0/).
This means that
you can do everything you want with it, provided you give attribution and
you do not use it commercially. For commercial use you have to contact the
German Bible Society. As long as you stay within these restrictions, you
may select, copy and modify this data in all quantities you like, and also
re-publish it under whatever license, provided the new license does not
permit commercial re-use.
### Provenance
The source data resides on a server of the ETCBC, managed by Constantijn
Sikkel. He makes that data available as an MQL database dump, together
with supplementary data files. From there it is transported to this GitHub
repo by means of a
[pipeline](https://github.com/ETCBC/pipeline).
This dataset contains several versions of the BHSA, from 2011 till now.
When you navigate to a version, you'll see more information about that version
and its provenance. For all versions the
[pipeline](https://github.com/ETCBC/pipeline) has been followed.
title: BHSA