index模式（type=index_ansj）不符合预期 #235

tangwang · 2024-04-11T06:39:50Z

正常index模式应该给出分词的多种情况，但是实际index模式和query模式分词结果一样，和readme里面介绍的不一样（readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”）：
以下是三个示例，text=中国是社会主义国家，text=中国，text=版本号，type=index_ansj，结果和索引模式的表现不一致，感觉都是query模式。而且确实和query模式的结果是一样的。

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

{
"result": [
{
"name": "中国",
"nature": "ns",
"offe": 0,
"realName": "中国",
"synonyms": null
},
{
"name": "是",
"nature": "v",
"offe": 2,
"realName": "是",
"synonyms": null
},
{
"name": "社会主义",
"nature": "n",
"offe": 3,
"realName": "社会主义",
"synonyms": null
},
{
"name": "国家",
"nature": "n",
"offe": 7,
"realName": "国家",
"synonyms": null
}
]
}

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

{
"result": [
{
"name": "中国",
"nature": "ns",
"offe": 0,
"realName": "中国",
"synonyms": null
}
]
}

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

{
"result": [
{
"name": "版本号",
"nature": "n",
"offe": 0,
"realName": "版本号",
"synonyms": null
}
]
}

版本是8.7.0：
bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v8.7.0/elasticsearch-analysis-ansj-8.7.0.0-release.zip

tangwang · 2024-04-11T06:52:02Z

分词配置跟readme不一致。readme里面给的检查方法：
通过 kibana 执行 GET /_cat/ansj/config 命令，获取配置文件内容如下：
{
"ambiguity": [
"ambiguity"
],
"stop": [
"stop"
],
"synonyms": [
"synonyms"
],
"crf": [
"crf"
],
"isQuantifierRecognition": "true",
"isRealName": "false",
"isNumRecognition": "true",
"isNameRecognition": "true",
"dic": [
"dic"
]
}

#实际上显示的：
{
"ambiguity": [],
"stop": [],
"synonyms": [],
"crf": [
"crf"
],
"isQuantifierRecognition": "true",
"isRealName": "false",
"isNumRecognition": "true",
"isNameRecognition": "true",
"dic": [
"dic"
]
}

liuxiaochen0625 · 2024-06-19T11:48:37Z

这个问题现在有结论了吗

shi-yuan · 2024-07-16T14:23:00Z

需要配置词典default.dic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index模式（type=index_ansj）不符合预期 #235

index模式（type=index_ansj）不符合预期 #235

tangwang commented Apr 11, 2024 •

edited

Loading

tangwang commented Apr 11, 2024

liuxiaochen0625 commented Jun 19, 2024

shi-yuan commented Jul 16, 2024

index模式（type=index_ansj）不符合预期 #235

index模式（type=index_ansj）不符合预期 #235

Comments

tangwang commented Apr 11, 2024 • edited Loading

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

tangwang commented Apr 11, 2024

liuxiaochen0625 commented Jun 19, 2024

shi-yuan commented Jul 16, 2024

tangwang commented Apr 11, 2024 •

edited

Loading