Chinese word segmentation model for spaCy #12984
Replies: 1 comment
-
Hi @PythonCancer,
Yes, features used by
spaCy itself doesn't provide specialized components for word segmentation (other than for tokenization, lemmatization, dependency parsing etc.). If you want to train your own word segmentation model and it outperforms the ones integrated in spaCy w.r.t. accuracy or speed, we're happy to consider integrating it. |
Beta Was this translation helpful? Give feedback.
-
The Chinese word segmentation model zh_core_web_sm-3.5.0 in spaCy has two files. One is weights.npz, which contains dimensions and model weight values, and I can understand that. The other file is features.msgpack; what is this file for? Is it for features? Because I want to train my own word segmentation model and embed it into spaCy, can you explain it?
Beta Was this translation helpful? Give feedback.
All reactions