-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Getting from Wiktionary to end-user usable WikDict dictionaries involves many processing steps. This summarizes the steps driven by the code in this repository. The short names for the processing steps which are used in the source code are given in parenthesis.
dbnary converts the Wiktionary markup into machine readable RDF triples. To query that data, it must first be loaded into an RDF database server, in our case the open source edition of OpenLink Virtuoso.
While RDF is very flexible, querying it is less efficient and the tooling is less mature compared to SQL databases. This steps runs SPARQL queries on the RDF data to extract all relevant data into tables in SQLite databases. No later step will touch the RDF data.
This steps cleans up the raw data and normalizes differences between the different languages.