Dictionary generation #1372
Replies: 5 comments
-
I've been trying to do the exact same thing and I've also noticed a few things:
It might help to apply orthographic rules backwards or use some sort of lemmatization for stuff like "continuous". P.S. Are you on the Plover Discord? We have tons of discussions about multilingual theories and dictionary generation there and we'd love to have you around. |
Beta Was this translation helpful? Give feedback.
-
There's in addition a list of briefs and misstrokes, which makes it easier to focus on the actually problematic cases. This way, it's possible to expand the steno dictionary by adding ways of pronunciation to the pronunciation dictionary, that way it will help people using all theories. (for instance, the entry for "raw" in the dictionary is "ɹɑ" instead of "ɹɔ" for some reason. Adding the latter will make the program generate the correct entry The syllable split may not be a large problem, as all possible syllable splits are generated. It just make the dictionary heavier. Currently it does not handle orthographic consonant doubling (dinner -> Regarding orthographic briefs: it may be (do -> I have some rules in place for prefix/suffix, but it generates (stemming and ) Lemmatization dictionaries/algorithms will be useful. |
Beta Was this translation helpful? Give feedback.
-
The script is mostly done. (include a feature to use I'll try turning of main.json off and use only this dictionary for a while to see if there's any problems... Problems so far (besides the problems mentioned above): (see https://github.com/user202729/plover-generate/issues) Remark: use WAS and BUT instead of WA and BU. F-R for [for] is fine for F-RT. |
Beta Was this translation helpful? Give feedback.
-
It's mostly usable now, although the brief reordering algorithm gets too aggressive at times ( Although given the large number of Plover briefs and brief-rules, (yes, I'm aware of the project that labels the whole dictionary), it might be better to just make it fully compatible with Plover dictionary. If someone don't know how to write a word, they wouldn't invent a brief for it using the brief rules. The idea was to remind/suggest/tell people about the autogenerated briefs so they can start using it, but the overhead of adding an entry to the dictionary is small. Still need a quick way to reorder the entries when a brief is added. |
Beta Was this translation helpful? Give feedback.
-
The intention is to...
Does automatic brief generation count? (some are quite predictable if forced into one stroke) |
Beta Was this translation helpful? Give feedback.
-
I'm trying to generate a dictionary according to Plover theory, and list all the rules so anyone who want to modify the rules can do that easily.
... at the moment I have a collection of scripts to do that, but it has some problems...
It cannot determine which ways to split the vowels are natural.
For instance, it may split "inform" as [i/nform] and generate
EU/TPH/TPORPL
, or "remove" as [rmove] and generateR/PHAOUF
.There are too many orthographic briefs. (or they appears to be. Could be part of the theory.)
It still could not handle derived words (like "(T-PB = continue) ⇒ (T-PB/OUS = continuous)", something like that)
and...
I have to write the rules manually.
(I found some information online about possible ways to avoid having to do this.)
The code is not exactly easy to modify.
(the code is at https://github.com/user202729/plover-generate . Currently there's no documentation, and you have to figure out how to use it.)
Beta Was this translation helpful? Give feedback.
All reactions