Adding support for Tibetan in spacy #13212
wienergm
started this conversation in
Language Support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am interested in helping to add support for Tibetan in spacy and was wondering if this would be something of interest to the spacy community.
One challenge of Tibetan is that "tokens" are typically syllables of words as opposed to complete words. A second pass is required to group the individual syllables into words. Tibetan words typically consist of 1, 2, or 3 syllables in sequence.
Would the grouping of Tibetan syllables into words occur prior to token generation or after token generation? How would this complication fit into the spacy framework?
Thank you very much!
I'm a newbie in spacy.
Beta Was this translation helpful? Give feedback.
All reactions