Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index模式(type=index_ansj)不符合预期 #235

Open
tangwang opened this issue Apr 11, 2024 · 3 comments
Open

index模式(type=index_ansj)不符合预期 #235

tangwang opened this issue Apr 11, 2024 · 3 comments

Comments

@tangwang
Copy link

tangwang commented Apr 11, 2024

正常index模式应该给出分词的多种情况,但是实际index模式和query模式分词结果一样,和readme里面介绍的不一样(readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”):
以下是三个示例,text=中国是社会主义国家,text=中国,text=版本号,type=index_ansj,结果和索引模式的表现不一致,感觉都是query模式。而且确实和query模式的结果是一样的。

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

{
"result": [
{
"name": "中国",
"nature": "ns",
"offe": 0,
"realName": "中国",
"synonyms": null
},
{
"name": "是",
"nature": "v",
"offe": 2,
"realName": "是",
"synonyms": null
},
{
"name": "社会主义",
"nature": "n",
"offe": 3,
"realName": "社会主义",
"synonyms": null
},
{
"name": "国家",
"nature": "n",
"offe": 7,
"realName": "国家",
"synonyms": null
}
]
}

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

{
"result": [
{
"name": "中国",
"nature": "ns",
"offe": 0,
"realName": "中国",
"synonyms": null
}
]
}

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

{
"result": [
{
"name": "版本号",
"nature": "n",
"offe": 0,
"realName": "版本号",
"synonyms": null
}
]
}

版本是8.7.0:
bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v8.7.0/elasticsearch-analysis-ansj-8.7.0.0-release.zip

@tangwang
Copy link
Author

分词配置跟readme不一致。readme里面给的检查方法:
通过 kibana 执行 GET /_cat/ansj/config 命令,获取配置文件内容如下:
{
"ambiguity": [
"ambiguity"
],
"stop": [
"stop"
],
"synonyms": [
"synonyms"
],
"crf": [
"crf"
],
"isQuantifierRecognition": "true",
"isRealName": "false",
"isNumRecognition": "true",
"isNameRecognition": "true",
"dic": [
"dic"
]
}

#实际上显示的:
{
"ambiguity": [],
"stop": [],
"synonyms": [],
"crf": [
"crf"
],
"isQuantifierRecognition": "true",
"isRealName": "false",
"isNumRecognition": "true",
"isNameRecognition": "true",
"dic": [
"dic"
]
}

@liuxiaochen0625
Copy link

这个问题现在有结论了吗

@shi-yuan
Copy link
Member

需要配置词典default.dic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants