Utilities to flavorize dictionary entries and calculate a smarter distance between languages.
To install in your project, use:
npm install @interslavic/database-engine --save
At the moment, rules are stored in CSV files under __fixture__/rules directory.
To calculate written intelligibility between Interslavic words and words from another Slavic language, this repository relies on the following algorithm:
- Every Interslavic entry gets processed in three separate ways:
- Etymological. The entry goes through the replacement rules,
that have
E
letter in their respectiveflavorizationLevel
column. - Standard. Only rules with
S
letter in theflavorizationLevel
column will be applied for the transformation. - Mistaken. Same as above, but for
M
letter.
- Etymological. The entry goes through the replacement rules,
that have
- The translation variants undergo all transformations with
R
(i.e.,Reverse
) letter defined in their flavorization level column.
In the end, there will be multiple variants to compare, e.g.:
Interslavic word: jęčmenny
Russian translation: ячменный (jačmennyj)
Mistaken reading (distance = 6):
jęčmenny -> jecmenny -> джецменну (džecmennu)
Standard reading (distance = 2):
jęčmenny -> ječmenny -> йечменны -> ечменны
Etymological reading (distance = 0):
jęčmenny (adj.) -> jačmenny -> jačmennyj -> йачменный -> ячменный
The algorithm uses Levenshtein's editing distance to tell how close
the Interslavic word and its translation are. If the base letters are the
same except for diacrictics, the editing distance is considered to be
0.5
characters instead of 1.