-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Issues with generating new dictionaries using cspell-tools #6379
Comments
Thank you for trying. Some dictionaries are very complicated and include nested compound rules. Can you share some more information:
|
@Jason3S I've used the one that's packaged by Fedora, which is this one: https://github.com/spellcheck-ko/hunspell-dict-ko There is also: https://github.com/wooorm/dictionaries/tree/main/dictionaries/ko I've setup the cspell config to look like so:
Will try maxDepth in a sec. I've tried installing cspell-tools globally and using that directly. Also tried hunspell-reader. Same way. |
Tried just now with this config:
Still got an OOM Kill. :( |
That means applying the rules is causing something to break. It is not ideal because it is a limited dictionary, but it is possible to get a basic word list without applying rules by using hunspell-reader. Like this: hunspell-reader words --no-transform ko_KR.aff -o ko-words.txt ---
targets:
- name: ko
sources:
- ko-words.txt
format: trie3
generateNonStrict: true
maxDepth: 0 Do you have a link to ko_KR.aff/dic you are using? |
I just noticed that you included it in a previous comment. |
Yep, basic dict generated properly. I'm guessing that the issue with compounding rules is that words in Korean can get weirdly complex. As in, the root word can be both pluralized (in different ways), conjugated on top of that and potentially have additional suffixes. Which can turn a single four radical root word, into tens of permutations. @Jason3S I'll try to link that to cspell and test it on some of the content I have and get back to you ASAP. Thank you for your help too mate 👍 |
Kind of Issue
Runtime - command-line tools, Building / Compiling
Tool or Library
cspell-tools
Version
8.14.4 and 8.15.2 for cspell-tools-cli
Supporting Library
No response
OS
Other
OS Version
Doesn't really matter
Description
Thanks for the great software.
I've been trying to help out by converting the Hunspell Korean dictionary into a cspell compatible source. But no matter what I try when running conversion, I get core dumps.
That's for sure caused by the size of the dict (11 mb for .aff 44 mb for .dic), I've tried bumping Max old space size up to 60 gigs (I've 64 gigs available right now), and it still dies. Any idea how I could split this job into chunks, so it runs longer but doesn't die?
Reporting this as a bug, because it seems to me that it tries to load up everything at once into memory and process it there, which causes it to run out (probably would run out until some ludicrous size).
Steps to Reproduce
No response
Expected Behavior
No response
Additional Information
No response
cspell.json
No response
cspell.config.yaml
No response
Example Repository
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: