You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've written a free program for learning languages called Lute (https://github.com/LuteOrg/lute-v3), and it would be nice to add Thai support. This library looks great, but I'm not sure what are the "best" parameters when using it. As I don't speak Thai, I can't say if the sentence splitting is accurate or not, for Thai learners.
I did some testing at https://github.com/jzohrab/lute_thai_testing -- can you suggest what might be the most accurate settings for the library, for splitting Thai texts into sentences for learners?
Cheers and regards!
The text was updated successfully, but these errors were encountered:
Word tokenizer: Deepcut, it is a state-of-the-act deep learning method for Thai word tokenizer but it is slow and use many compute, so you can use newmm is a dictionary-based, maximum matching, constrained by Thai Character Cluster (TCC) boundaries with improved TCC rules that are used in newmm. If you want to improve newmm, you can deepcut for doing the update dictionary from your data and add new words to newmm's dictionary. see more: https://pythainlp.org/tutorials/notebooks/pythainlp_get_started.html#Word
Hello, thank you for the library.
I've written a free program for learning languages called Lute (https://github.com/LuteOrg/lute-v3), and it would be nice to add Thai support. This library looks great, but I'm not sure what are the "best" parameters when using it. As I don't speak Thai, I can't say if the sentence splitting is accurate or not, for Thai learners.
I did some testing at https://github.com/jzohrab/lute_thai_testing -- can you suggest what might be the most accurate settings for the library, for splitting Thai texts into sentences for learners?
Cheers and regards!
The text was updated successfully, but these errors were encountered: