Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add slovak (sk) language #41

Open
neurlang opened this issue Nov 11, 2023 · 0 comments
Open

add slovak (sk) language #41

neurlang opened this issue Nov 11, 2023 · 0 comments

Comments

@neurlang
Copy link

I would like to suggest adding the dataset.txt of 24865 slovak words, these are hand reviewed. What license would be preferrable to the gruut project? I am the author, can release it under any license you prefer.

https://github.com/neurlang/toipa/tree/master/sk2ipa

Fixes which would be needed:

  1. remove the ' character
  2. replace θ to c
  3. add spaces between phonemes
  4. remove words which map to the A / F placeholder

Then they would be loaded into the lexicon.db word_phonemes table.

What is g2p_alignments table for?

I can also generate a larger dictionary using the neural network (up to 300k words) but these could contain mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant