Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文more like this查询,highlight的词汇不对 #1077

Open
xuetaofeng opened this issue Oct 21, 2024 · 1 comment
Open

中文more like this查询,highlight的词汇不对 #1077

xuetaofeng opened this issue Oct 21, 2024 · 1 comment

Comments

@xuetaofeng
Copy link

xuetaofeng commented Oct 21, 2024

Description

中文more like this查询,highlight的词汇不对。 比如我查询 “项目经理”,但是返回的结果highlight的是: “高< em>级项目经< /em>理(”

Steps to reproduce

创建ik_smart的index

#!/usr/bin/bash
curl -X DELETE "localhost:9201/my_index"
curl -X PUT "localhost:9201/my_index" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"my_ik_smart": {
"type": "custom",
"tokenizer": "ik_smart"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_ik_smart",
"position_increment_gap": 1,
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
'

插入文档:

#! /usr/bin/bash

curl -X POST "localhost:9201/my_index/_doc/1" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)"
]
}
EOF

curl -X POST "localhost:9201/my_index/_doc/2" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"开发工程师",
"前任 Google 软件工程师经理",
"现任 Facebook 高级开发工程师"
]
}
EOF

curl -X POST "localhost:9201/my_index/_doc/3" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"数据分析师",
"前任 IBM 数据分析师",
"现任 Amazon 数据科学家",
"现任 Amazon 项目数据科学家"
]
}
EOF

使用more like this 和 highlight 查询:

#! /usr/bin/bash
curl -X POST "localhost:9201/my_index/_search?pretty" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"query": {
"more_like_this": {
"fields": ["title"],
"like": "项目经理",
"min_term_freq": 1,
"min_doc_freq": 1,
"analyzer": "my_ik_smart"
}
},
"highlight": {
"fields": {
"title": {"type": "fvh",
"fragment_size": 150,
"number_of_fragments": 3}
}
}
}
EOF

Priovde your configuration or code snippet that helps.

Expected behavior

期望项目经理可以得到highlight

Actual behavior

得到结果,:

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3648179,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.3648179,
"_source" : {
"title" : [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)"
]
},
"highlight" : {
"title" : [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高< em>级项目经< /em>理(till 03/2020)"
]
}
}
]
}
}

Environment

@xuetaofeng
Copy link
Author

我用smartcn 分词器就没有问题。只有使用ik_smart, ik_max_word有问题。 注意是对数组存在highlight的问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant