Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

英文和数字分词问题 #1070

Open
miaomiaojie1 opened this issue Aug 18, 2024 · 0 comments
Open

英文和数字分词问题 #1070

miaomiaojie1 opened this issue Aug 18, 2024 · 0 comments

Comments

@miaomiaojie1
Copy link

例如:ccc100-n2-h3,使用ik_max_word分词的结果是ccc100-n2-h3 ccc 100 n 2 h 3 ,将n2 h3添加了主词库之后,分词的结果是ccc100-n2-h3 ccc 100 n2 n 2 h3 h 3 ,我希望的结果是ccc100 n2 h3,这种添加主词之后n2和h3为什么还是分开了?
再如:logger V300r200c20spc300 使用ik_max_word分词的结果是 logger v300r200c20spc300 v 300 r 200 c 20 spc 300 我希望的结果是logger V300 r200 c20 spc300,这种用自定义的分词策略能实现吗,会产生歧义吗?
再如:aicc 12.300.4,使用ik_max_word分词的结果是aicc 12.300.4,我希望的结果是aicc 12 12.300 12.300.4,这种ik有这样的能力吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant