From 19f7a6763d06487ed4daa6ad94eea75e87c165eb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E2=80=98JoinTyang=E2=80=99?= Date: Tue, 10 Sep 2024 11:17:35 +0800 Subject: [PATCH] update --- README.md | 2 +- manual/README.md | 2 +- manual/api/seasearch_api.md | 575 ++++++++++-------------------- manual/config/README.md | 54 +-- manual/deploy/README.md | 26 +- manual/setup/README.md | 104 +++--- manual/setup/compile_seasearch.md | 110 ------ manual/setup/install_faiss.md | 133 ------- mkdocs.yml | 4 +- 9 files changed, 287 insertions(+), 723 deletions(-) delete mode 100644 manual/setup/compile_seasearch.md delete mode 100644 manual/setup/install_faiss.md diff --git a/README.md b/README.md index c159c24..58c37e0 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Manual for SeaSearch -The web site: https://haiwen.github.io/seasearch-docs/ +The website: https://haiwen.github.io/seasearch-docs/ ## Serve docs locally diff --git a/manual/README.md b/manual/README.md index a7411e5..a385220 100644 --- a/manual/README.md +++ b/manual/README.md @@ -1,3 +1,3 @@ # Introduction -ZincSearch 是一个 Go 语言实现的全文检索服务器,提供了兼容 ElasticSearch DSL 的 API。它采用了 Bluge 作为索引引擎。Bluge 是一个广泛使用的 Go 语言全文索引库 Bleve(由 CouchBase 公司开发)的 fork 版本,对代码进行重构改造,使得它更加现代化和灵活。 +ZincSearch is a full-text search server implemented in Go language, providing an API compatible with ElasticSearch DSL. It uses Bluge as the indexing engine. Bluge is a fork version of Bleve (developed by CouchBase), a widely used Go language full-text indexing library, which refactors the code to make it more modern and flexible. diff --git a/manual/api/seasearch_api.md b/manual/api/seasearch_api.md index 4189c29..63b63f8 100644 --- a/manual/api/seasearch_api.md +++ b/manual/api/seasearch_api.md @@ -1,325 +1,211 @@ -# API 介绍 +# API introduction +SeaSearch uses Http Basic Auth for permission verification, and the API request needs to carry the corresponding token in the header. +Generate basic auth through this tool: [http://web.chacuo.net/safebasicauth](http://web.chacuo.net/safebasicauth) -SeaSearch 通过 Http Basic Auth 进行权限校验,API 请求需要在 header 中携带对应的 token。 +## User Management +### Administrator User +SeaSearch manages API permissions through accounts. When the program is started for the first time, an administrator account needs to be configured through environment variables. -生成 basic auth 可以通过这个工具: [http://web.chacuo.net/safebasicauth](http://web.chacuo.net/safebasicauth) +The following is an example of an administrator account: - - -## 用户管理 - - - -### 管理员用户 - - - -SeaSearch 通过账户来管理API权限等,程序在第一次启动时,需要通过环境变量配置一个管理员帐号 - - - -以下是 管理员帐号示例: - -```plaintext +``` set ZINC_FIRST_ADMIN_USER=admin -set ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123 +set ZINC_FIRST_ADMIN_PASSWORD=xxx ``` +### Normal User -### 普通用户 - +Users can be created/updated via the API: - -可以通过API来创建/更新用户: - -```plaintext +``` [POST] /api/user { "_id": "prabhat", "name": "Prabhat Sharma", "role": "admin", // or user - "password": "Complexpass#123" + "password": "xxx" } ``` +get all users: -获取所有用户: - -```plaintext +``` [GET] /api/user ``` +delete user: -删除用户: - -```plaintext +``` [DELETE] /api/user/${userId} ``` +## Index related +### create index -## 索引相关 - - - -### 创建索引 - - - -创建一个 SeaSearch 索引,并且在此时可以同时设置 mappings 以及 settings。 - +Create a SeaSearch index, and you can set both mappings and settings at the same time. +We can also set settings or mapping directly through other requests. If the index does not exist, it will be created automatically. -我们也可以直接通过其他请求设置 settings 或者 mapping,如果 index不存在,则会自动创建。 +SeaSearch documentation:[https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index](https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index) +ES documentation:[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html) +### Configuring mappings -SeaSearch 文档:[https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index](https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index) - - - -参考 ES api文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html) - - - -### 配置 mappings - - - -mappings 定义了 document 中,字段的规则,例如类型,格式等。 - - - -可以通过单独的 API 来配置 mapping: - +Mappings define the rules for fields in a document, such as type, format, etc. +Mapping can be configured via a separate API: SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/) +ES related instructions:[https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) -ES 相关说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) - - - -### 配置 settings - - - -settings 设置了 index 的 analyzer 分片等相关设置。 - +### Configuring settings +Settings set the analyzer sharding and other related settings of the index. SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/) +ES related instructions: + * analyzer related concepts:[https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html) -ES 相关说明: - - - - * analyzer 相关概念:[https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html) - - * 如何指定 analyzer:[https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html) - - - + * How to specify an analyzer:[https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html) +### Analyzer support +Analyzer can configure the default when creating an index, or set it for a specific field. (Refer to the settings ES documentation in the previous section to understand the relevant concepts.) -### analyzer 支持 +The analyzers supported by SeaSearch can be found on this page: [https://zincsearch-docs.zinc.dev/api/index/analyze/](https://zincsearch-docs.zinc.dev/api/index/analyze/). The concepts such as tokenize and token filter are consistent with ES, and most of the commonly used analyzers and tokenizers in ES are supported. +Supported general analyzers + * standard, the default analyzer. If not specified, this analyzer is used to split words and lowercase them. -analzyer 可以在创建索引索引时配置 default ,也可以针对某个字段进行设置。(参考上一节中 settings ES 的文档了解相关概念。) + * simple, split according to non-letters (symbols are filtered), lowercase + * keyword, no word segmentation, directly treat input as output + * stop, lowercase, stop word filter (the, a, is, etc.) -SeaSearch 支持的 analyzer可以在这个页面中找到:[https://zincsearch-docs.zinc.dev/api/index/analyze/](https://zincsearch-docs.zinc.dev/api/index/analyze/) 里面的 tokenize, token filter 等概念和 ES 是一致的,且支持 ES 大部分常用的 analyzer 和 tokenizer 等。 + * web, implemented by Bluge, matching email addresses, urls, etc. Handling lowercase, using stop word filters + * regexp/pattern, regular expression, default is \W+ (non-character segmentation), supports lowercase and stop words + * whitespace, split by space, do not convert to lowercase -支持的常规analyzer +### Luanguages analyzers +| Country | Shortened form | +| -------------- | -------------- | +| arabic | ar | +| Asia Countries | cjk | +| sorani | ckb | +| danish | da | +| german | de | +| english | en | +| spanish | es | +| persian | fa | +| finnish | fi | +| french | fr | +| hindi | hi | +| hungarian | hu | +| italian | it | +| dutch | nl | +| norwegian | no | +| portuguese | pt | +| romanian | ro | +| russian | ru | +| swedish | sv | +| turkish | tr | - * standard 默认的 analyzer,如果没有指定,则采用此 analyzer,按词切分,小写处理 - * simple 按照非字母切分(符号被过滤),小写处理 +Chinese analyzer: - * keyword 不分词,直接将输入当作输出 + * gse_standard, use the shortest path algorithm to segment words - * stop 小写处理,停用词过滤器 (the、a、is等) + * gse_search, the search engine's word segmentation mode provides as many keywords as possible - * web 由 buluge 实现,匹配 邮箱、url 等。处理小写,使用停用词过滤器 +The Chinese analyzer uses the [gse](https://github.com/go-ego/gse) library to implement word segmentation. It is a Golang implementation of the Python stammer library. It is not enabled by default and needs to be enabled through environment variables. - * regexp/pattern 正则表达式,默认\W+(非字符分割),支持设置 小写、停用词 - - * whitespace 按照空格切分,不转小写 - - - - - - -多语言 analzyer: - -语言| analyzer ----|--- -阿拉伯语| ar -丹麦语| da -德语| de -英语| english -西班牙语| es -波斯语| fa -亚洲地区国家| cjk -芬兰语| fi -法语| fr -印地语| hi -匈牙利语| hu -意大利语| it -荷兰语| nl -挪威语| no -葡萄牙语| pt -罗马尼亚语| ro -俄语| ru -瑞典语| sv -土耳其语| tr -索拉尼| ckb - - - -中文 analzyer: - - - - * gse_standard 使用最短路径算法来分词 - - * gse_search 搜索引擎的分词模式,提供尽可能多的关键词 - - - - - - -中文 analyzer 使用的是 [gse](https://github.com/go-ego/gse) 这个库实现分词,是 python 结巴库的 Golang 实现,默认是没有启用的,需要通过环境变量来启用 - -```plaintext +``` ZINC_PLUGIN_GSE_ENABLE=true -# true 启用中文分词支持,默认false +# true: enable Chinese word segmentation support, default is false ZINC_PLUGIN_GSE_DICT_EMBED=BIG -# BIG:使用gse内置词库与停用词;否则,使用 SeaSearch 内置的简单词库,默认 small +# BIG: use the gse built-in vocabulary and stop words; otherwise, use the SeaSearch built-in simple vocabulary, the default is small ZINC_PLUGIN_GSE_ENABLE_STOP=true -# true 使用停用词,默认 true +# true: use stop words, default true ZINC_PLUGIN_GSE_ENABLE_HMM=true -# 使用 HMM 模式用于搜素分词,默认为 true +# Use HMM mode for search word segmentation, default is true ZINC_PLUGIN_GSE_DICT_PATH=./plugins/gse/dict -# 使用用户自定义词库与停用词,需要将内容放在配置的这个路径下,并且词库命名为 user.txt -停用词命名为 stop.txt +# To use a user-defined word library and stop words, you need to put the content in the configured path, and name the word library user.txt and the stop words stop.txt ``` -## 全文检索 - - +## Full text search ### document CRUD +create document: +SeaSearch API: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/) -创建 document: - - - -SeaSearch :[https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/) - - +ES API:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html) -ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html) +update document: +SeaSearch API: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/) +ES API: [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) -更新 document : +delete document: +SeaSearch API: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/) +ES API: [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html) -SeaSearch:[https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/) +Get document by id: - - -ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) - - - -删除 document: - - - -SeaSearch: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/) - - - -ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html) - - - -根据 id获取 document: - -```plaintext +``` [GET] /api/${indexName}/_doc/${docId} ``` +### Batch Operation -### 批量进行操作 - - - -应该尽量使用批量操作更新索引 - - - -SeaSearch文档: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request](https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request) - - - -ES api说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) - - +Batch operations should be used to update indexes whenever possible. -### 搜索 +SeaSearch API: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request](https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request) +ES API:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) -api示例: - +### search +API examples: [https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/) - - -全文搜索使用 DSL,使用方法可以参考: - - +Full-text search uses DSL. For usage, please refer to: [https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) +delete-by-query:Delete based on query - -delete-by-query:根据 query进行删除: - -```plaintext +``` [POST] /es/${indexName}/_delete_by_query { @@ -331,54 +217,37 @@ delete-by-query:根据 query进行删除: } ``` +ES API: [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html) -ES api 文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html) - - - -multi-search,支持对不同 index 执行不同的 query: - - - -SeaSearch 文档:[https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/) +multi-search,supports executing different queries on different indexes: +SeaSearch API: [https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/) +ES API: [https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html) -ES api 文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html) +We have extended multi-search to support using the same statistics when searching different indexes to make the score calculation more accurate. You can enable it by setting query: unify_score=true in the request. - - -我们对 multi-search 做了扩展,使它支持在搜索不同的索引时,使用相同的统计信息,以使得得分计算更加精确,在请求中设置 query:unify_score=true 即可开启。 - -```plaintext -[POST] /es/ +``` +[POST] /es/_msearch?unify_score=true {"index": "t1"} -{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]} +{"query": {"bool": {"should": [{"match": {"filename": {"query": "test string", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "test string", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]} {"index": "t2"} -{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]} +{"query": {"bool": {"should": [{"match": {"filename": {"query": "test string", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "test string", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]} ``` -## 向量检索 - - - -我们为 SeaSearch 扩展开发了向量检索的功能,以下是相关API介绍。 +## Vector search +We have developed a vector search function for the SeaSearch extension. The following is an introduction to the relevant API. +### Create vector search -### 创建向量索引 +To use the vector search function, you need to create a vector index in advance, which can be done through mapping. +We create an index and set the vector field of the document data to be written to be called "vec", the index type is flat, and the vector dimension is 768 - -使用向量检索功能,需要提前创建向量索引,可以通过 mapping 的方式建立。 - - - -我们创建一个索引,设置写入的文档数据的向量字段叫 "vec",索引类型是 flat, 向量维度是 768 - -```plaintext +``` [PUT] /es/${indexName}/_mapping { @@ -394,31 +263,26 @@ ES api 文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current } ``` +Parameter Description: -参数说明: - -```plaintext -${indexName} zincIndex 索引名称 - -type 固定为 vector,表示向量索引 -dims 向量维度 -m ivf_pq 索引所需参数,需要能被 dims整除 -nbits ivf_pq 索引所需参数,默认为 8 -vec_index_type 索引类型,支持 flat, ivf_pq 两种 ``` +${indexName} zincIndex, index name +type, fixed to vector, indicating vector index +dims, vector dimensions +m, ivf_pq index required parameters, need to be divisible by dims +nbits, ivf_pq index required parameter, default is 8 +vec_index_type, index type, supports two types: flat and ivf_pq +``` -### 写入包含向量的document - - - -写入包含向量 document 与写入普通document 在 API层面并无差异,可自行选择合适的方式。 +### Write a document containing a vector +There is no difference between writing a document containing a vector and writing a normal document at the API level. You can choose the appropriate method. -下面以 bluk API 为例 +The following takes the bluk API as an example -```plaintext +``` [POST] /es/_bulk body: @@ -431,22 +295,16 @@ body: {"name": "jack3","vec":[10.2,12.41,9.5,22.2]} ``` +Note that the _bulk API strictly requires the format of each line, and the data cannot exceed one line. For details, please refer to [ES bulk](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) -注意 _bulk API 严格要求每一行的格式,数据不能超过一行,详细请参考 [ES bulk](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) - +Modification and deletion can also be done using bulk. After deleting a document, its corresponding vector data will also be deleted -修改和删除,也可以使用 bulk,删除 document 之后,其对应的向量数据同样会被删除 +### Retrieval vector +By passing in a vector, we can search for N similar vectors in the system and return the corresponding document information: - -### 检索向量 - - - -通过传入一个 向量,搜索系统中N个相似的向量,并返回对应文档信息: - -```plaintext +``` [POST] /api/${indexName}/_search/vector body: @@ -461,72 +319,56 @@ body: } ``` +The API response format is the same as the full-text search format. -API 响应格式与 全文检索格式相同。 - +The following is a description of the parameters: - -以下是参数说明: - -```plaintext -${indexName} zincIndex 索引名称 - -query_field 要检索 index 中的哪个字段,字段必须为 vector 类型 -k 要返回的 K 个最相似的向量数量 -return_fields 单独返回的字段名称 -vector 用于查询的向量 -nprobe 仅对 ivf_pq 索引类型生效,要查询的聚蔟数量,数量越高,越精确 -_source 用于控制是否返回 _source 字段,支持 bool或者一个数组,描述需要返回哪些字段 ``` +${indexName} zincIndex, index name +query_field, the field in the index to retrieve, the field must be of vector type +k, the number of K most similar vectors to return +return_fields, the name of the field to be returned individually +vector, the vector used for query +nprobe, only works for ivf_pq index type, the number of clusters to query, the higher the number, the more accurate +_source, it is used to control whether to return the _source field, supports bool or an array, describing which fields need to be returned -### 重建索引 - +``` +### Rebuild index -立即对索引进行重建,适用于不等待后台自动检测的情况 +Rebuild the index immediately, suitable for situations where you don't need to wait for background automatic detection. -```plaintext +``` [POST] /api/:target/:field/_rebuild ``` - -### 查询 recall +### query recall +For vectors of type ivf_pq, recall checks can be performed on their data. -对于 ivf_pq 类型的向量,可以对其数据进行 recall 检查 - -```plaintext +``` [POST] /api/:target/_recall { - "field":"vec_001", # 要测试的字段 + "field":"vec_001", # Fields to test "k":10, - "nprobe":5, # nprobe 数量 - "query_count":1000 # 进行测试的次数 + "nprobe":5, # nprobe number + "query_count":1000 # Number of times the test was performed } ``` +# Vector search usage examples -# 向量检索使用示例 - - - -接下来实际演示如何 索引一批 papers,每个 paper 可能包含多个需要被索引的向量,我们希望通过 向量检索,得到最相似的 N 个向量,从而得到其对应的 paper-id。 - +Next, we will demonstrate how to index a batch of papers. Each paper may contain multiple vectors that need to be indexed. We hope to obtain the most similar N vectors through vector retrieval, and thus obtain their corresponding paper-ids. +## Creating SeaSearch indexes and vector indexes -## 创建 SeaSearch 索引与向量索引 +The first step is to set the mapping of the vector index. When setting the mapping, the index and vector index are automatically created. +Since paper-id is just a normal string, we don't need to analyze it, so we set its type to keyword: - -首先是设定 向量索引的 mapping,在设定mapping时,index 和向量索引 会自动创建 - - - -由于 paper-id 只是一个普通的字符串,我们无需进行 analyze, 所以我们设置其类型为 keyword: - -```plaintext +``` [PUT] /es/paper/_mapping { @@ -544,37 +386,31 @@ _source 用于控制是否返回 _source 字段,支持 bool或者一个数组 } ``` +Through the above request, we created an index named paper and established a flat vector index for the title-vec field of the index. -通过以上请求,我们创建了一个名为 paper 的 index,并为索引的 title-vec 字段,建立了 flat 类型的向量索引。 - +## Index data +We write these paper data to SeaSearch in batches through the _bulk API. -## 索引数据 - - - -我们通过 _bulk API 批量向 SeaSearch 写入这些 paper 数据 - -```plaintext +``` [POST] /es/_bulk { "index" : {"_index" : "paper" } } -{"paper-id": "001"," -{ " +{"paper-id": "001","title-vec":[10.2,10.40,9.5,22.2....]} +{ "index" : {"_index" : "paper" } } {"paper-id": "002","title-vec":[10.2,11.40,9.5,22.2....]} -{ " +{ "index" : {"_index" : "paper" } } {"paper-id": "003","title-vec":[10.2,12.40,9.5,22.2....]} .... -``` - -## 检索数据 +``` +## Retrieving data -现在我们可以用向量检索: +Now we can retrieve it using the vector: -```plaintext +``` [POST] /api/paper/_search/vector { @@ -585,36 +421,26 @@ _source 用于控制是否返回 _source 字段,支持 bool或者一个数组 } ``` +The document corresponding to the most similar vector can be retrieved, and the paper-id can be obtained. Since a paper may contain multiple vectors, if multiple vectors of a paper are very similar to the query vector, then this paper-id may appear multiple times in the results. -可以检索出最相似的向量对应的 document,并得到 paper-id。由于一个 paper 可能包含多个 向量,如果某个 paper 的多个向量都与查询的 向量 非常相似,那么这个 paper-id 可能出现在结果中多次。 - - - -## 维护向量数据 +## Maintaining vector data +### Update the document directly +After a document is successfully imported, SeaSearch will return its doc id. We can directly update a document based on the doc id: -### 直接更新document - - - -在一个 document 成功导入之后,SeaSearch会返回其 doc id,我们可以根据 doc id 直接更新一个document: - -```plaintext +``` [POST] /es/_bulk { "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } } {"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]} ``` +### Query first and then update -### 先查询再更新 - +If the returned doc id is not saved, you can first use SeaSearch's full-text search function to query the documents corresponding to paper-id: - -如果没有保存返回的 doc id,可以先利用 SeaSearch 的全文检索功能,查询 paper-id 对应的docuemnts: - -```plaintext +``` [POST] /es/paper/_search { @@ -630,55 +456,36 @@ _source 用于控制是否返回 _source 字段,支持 bool或者一个数组 } ``` +Through DSL, we can directly retrieve the document corresponding to the paper-id and its doc id. -通过 DSL,我们可以直接检索到 paper-id 对应的 document 以及其 doc id。 - - - -### 全量更新 paper - - - -一个 paper 包含多个向量,如果某个向量需要更新,那么我们直接更新这个向量对应的 document即可,但是在实际应用中,区分一个 paper的内容哪些是新增的,哪些是更新的,是不太容易的。 - - - -我们可以采用全量更新的方式: - - +### Fully updated paper - * 首先通过 DSL 查询出一个 paper 所有的 document +A paper contains multiple vectors. If a vector needs to be updated, we can directly update the document corresponding to the vector. However, in actual applications, it is not easy to distinguish which contents of a paper are newly added and which are updated. - * 删除所有的 document +We can adopt the method of full update: - * 导入最新的 paper 数据 + * First, query all documents of a paper through DSL + * Delete all documents + * Import the latest paper data +Steps 2 and 3 can be performed in one batch operation. +The following example will demonstrate deleting the document of paper 001 and re-importing it; at the same time, directly updating paper 005 and paper 006 because they only have one vector: - -第2和第3步,可以在一个 批量 操作中进行。 - +``` +[POST] /es/_bulk -下面的例子将演示删除 paper 001 的 document,并重新导入;同时,直接更新 paper 005 和 paper 006,因为它们只有一个向量: +{ "index" : {"_index" : "paper" } } +{"paper-id": "001","title-vec":[10.2,10.40,9.5,22.2....]} +{ "index" : {"_index" : "paper" } } +{"paper-id": "002","title-vec":[10.2,11.40,9.5,22.2....]} +{ "index" : {"_index" : "paper" } } +{"paper-id": "003","title-vec":[10.2,12.40,9.5,22.2....]} +.... -```plaintext -[POST] /es/_bulk -{ "delete" : {"_id":"23gZX9eT6Q8","_index" : "paper" } } -{ "delete" : {"_id":"23gZX9eT6Q0","_index" : " -{ "delete" : {"_id":"23gZX9eT6Q3","_index" : " -{ "index" : {"_index" : " -{"paper-id": "001","vec":[10.2,1.41,9.5,22.2...]} -{ " -{" -{ " -{" -{ "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } } -{"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]} -{ "update" : {"_id":"23gZX9eT6QY","_index" : "paper" } } -{"paper-id": "006","vec":[10.2,1.43,9.5,22.2...]} ``` diff --git a/manual/config/README.md b/manual/config/README.md index 7294c0f..6eb15f5 100644 --- a/manual/config/README.md +++ b/manual/config/README.md @@ -1,21 +1,21 @@ -# SeaSearch 配置项目 +# SeaSearch Configuration -官方配置可以参考:[https://zincsearch-docs.zinc.dev/environment-variables/](https://zincsearch-docs.zinc.dev/environment-variables/) +The official configuration can be referenced:[https://zincsearch-docs.zinc.dev/environment-variables/](https://zincsearch-docs.zinc.dev/environment-variables/) -以下配置说明,为我们扩展的配置项,所有配置,都是以环境变量的方式设置的。 +The following configuration instructions are for our extended configuration items. All configurations are set in the form of environment variables. -## 扩展配置 +## Extended configuration ``` -GIN_MODE gin框架的日志模式,默认为 release -ZINC_WAL_ENABLE 是否启用 WAL,默认启用 +GIN_MODE, log mode of gin framework,default release +ZINC_WAL_ENABLE, whether to enable WAL,defaule enabled ZINC_STORAGE_TYPE -ZINC_MAX_OBJ_CACHE_SIZE 启用 s3,oss时,本地最大缓存文件大小 -ZINC_SHARD_LOAD_OBJS_GOROUTINE_NUM 索引加载并行度,在启用s3和Oss时,能提升索引载速度 +ZINC_MAX_OBJ_CACHE_SIZE, when s3 and oss are enabled, the maximum local cache file size +ZINC_SHARD_LOAD_OBJS_GOROUTINE_NUM, index loading parallelism, when S3 and oss are enabled, can improve the index loading speed -ZINC_SHARD_NUM zincsearch 原有默认为 3,由于 seaseach 都是每个资料库一个索引,为了提升加载效率,改为默认为 1 +ZINC_SHARD_NUM zincsearch the original default value is 3. Since seaseach has one index per database, in order to improve loading efficiency, the default value is changed to 1 -s3相关,仅在 ZINC_STORAGE_TYPE=s3 时生效 +S3 related, only valid when ZINC_STORAGE_TYPE=s3 ZINC_S3_ACCESS_ID ZINC_S3_USE_V4_SIGNATURE ZINC_S3_ACCESS_SECRET @@ -24,41 +24,41 @@ ZINC_S3_USE_HTTPS ZINC_S3_PATH_STYLE_REQUEST ZINC_S3_AWS_REGION -oss相关,仅在 ZINC_STORAGE_TYPE=oss 时生效 +OSS related, only valid when ZINC_STORAGE_TYPE=oss ZINC_OSS_ACCESS_ID ZINC_OSS_ACCESS_SECRET ZINC_OSS_BUCKET ZINC_OSS_ENDPOINT -集群相关 -ZINC_SERVER_MODE 默认 none 为单机部署,可选 cluster,集群时必须为 cluster -ZINC_CLUSTER_ID 集群id,需要全局唯一 -ZINC_ETCD_ENDPOINTS etcd 地址 -ZINC_ETCD_ENDPOINTS etcd key前缀 默认 /zinc -ZINC_ETCD_USERNAME etcd 用户名 -ZINC_ETCD_PASSWORD etcd 密码 +cluster related +ZINC_SERVER_MODE, default none for standalone deployment, optional to cluster, must be cluster for cluster deployment +ZINC_CLUSTER_ID, cluster id,need to be globally unique +ZINC_ETCD_ENDPOINTS, etcd address +ZINC_ETCD_ENDPOINTS, etcd key prefix, default /zinc +ZINC_ETCD_USERNAME, etcd username +ZINC_ETCD_PASSWORD, etcd password -日志相关 -ZINC_LOG_OUTPUT 是否将日志输出到文件,默认 是 -ZINC_LOG_DIR 日志目录,建议配置,默认为当前目录下的 log 子目录 -ZINC_LOG_LEVEL 日志级别,默认 debug +log related +ZINC_LOG_OUTPUT, whether to output logs to files, default yes +ZINC_LOG_DIR, log directory, recommended configuration, default is the log subdirectory under the current directory +ZINC_LOG_LEVEL, log level,default debug ``` -## proxy 配置 +## proxy configuration ``` ZINC_CLUSTER_PROXY_LOG_DIR=./log ZINC_CLUSTER_PROXY_HOST=0.0.0.0 ZINC_CLUSTER_PROXY_PORT=4082 -ZINC_SERVER_MODE=proxy #必须为proxy +ZINC_SERVER_MODE=proxy # must be proxy ZINC_ETCD_ENDPOINTS=127.0.0.1:2379 ZINC_ETCD_PREFIX=/zinc -ZINC_MAX_DOCUMENT_SIZE=1m #bulk和multisearch 对单个最大document的限制,默认1m -ZINC_CLUSTER_MANAGER_ADDR=127.0.0.1:4081 #manager 地址 +ZINC_MAX_DOCUMENT_SIZE=1m # Bulk and multisearch limit on the maximum single document,default 1m +ZINC_CLUSTER_MANAGER_ADDR=127.0.0.1:4081 # manager address ``` -## cluster-manger 配置 +## cluster-manger configuration ``` ZINC_CLUSTER_MANAGER_LOG_DIR=./log diff --git a/manual/deploy/README.md b/manual/deploy/README.md index ed2781e..4f435f8 100644 --- a/manual/deploy/README.md +++ b/manual/deploy/README.md @@ -1,28 +1,28 @@ -# 启动 SeaSearch +# Launch SeaSearch -## 启动单机 +## Start a single machine -对于开发环境而言,只需要按照官方说明,配置 启动帐号和启动密码两个 环境变量即可。 +For the development environment, you only need to follow the official instructions to configure the two environment variables of the startup account and startup password. -编译 SeaSearch 参考: [Setup](../setup/README.md) +Compile SeaSearch reference: [Setup](../setup/README.md) -对于开发环境,直接配置环境变量,并启动二进制文件即可; +For the development environment, simply configure the environment variables and start the binary file -以下命令会首先创建一个 data文件夹,作为默认的存储路径,之后以 admin 以及 Complexpass#123作为初始用户,启动一个 SeaSearch 程序,并默认监听4080端口: +The following command will first create a data folder as the default storage path, then start a SeaSearch program with admin and xxx as the initial users, and listen to port 4080 by default: ``` mkdir data -ZINC_FIRST_ADMIN_USER=admin ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123 GIN_MODE=release ./SeaSearch +ZINC_FIRST_ADMIN_USER=admin ZINC_FIRST_ADMIN_PASSWORD=xxx GIN_MODE=release ./SeaSearch ``` -如果需要重置数据,删除整个 data 目录再重启即可,这会清理所有元数据以及索引数据。 +If you need to reset the data, just delete the entire data directory and restart, which will clean up all metadata and index data. -## 启动集群 +## Start the cluster -1. 启动 etcd +1. Start etcd -2. 启动 SeaSearch 节点,节点会自动向 etcd 注册心跳。 +2. Start the SeaSearch node, which will automatically register its heartbeat with etcd. -3. 启动 cluster-manager,然后通过 API 或者 直接向 etcd 设置 cluster-info,设置SeaSearch 节点的地址。并且同时,cluster-manager 开始根据节点心跳对分片进行分配。 +3. Start cluster-manager, then set the address of the SeaSearch node through the API or directly set cluster-info to etcd. At the same time, cluster-manager starts to allocate shards based on the node heartbeat. -4. 启动 SeaSearch-proxy,此时就可以对外提供服务了。 +4. Start SeaSearch-proxy, and you can now provide services to the outside world. diff --git a/manual/setup/README.md b/manual/setup/README.md index 5e1b5c2..643d28d 100644 --- a/manual/setup/README.md +++ b/manual/setup/README.md @@ -1,34 +1,34 @@ -# 安装 SeaSearch +# Installation of SeaSearch -原版的 SeaSearch 采用纯 go 语言编写,直接通过 Go 编译工具即可编译。在我们引入向量检索功能时,用到了 faiss 库,这个库需要以 CGO 的方式调用,所以对 SeaSearch 的编译会产生影响。 +The original version of SeaSearch is written in pure Go language and can be compiled directly through the Go compilation tool. When we introduced the vector search function, we used the faiss library, which needs to be called in CGO mode, so it will affect the compilation of SeaSearch. -## 安装 faiss +## Installation of faiss -要在一台机器上编译或者运行 SeaSearch,需要这台机器安装 faiss 库。下面是具体安装步骤,适用于 x86 linux 机器,流程采用的操作系统为 debian 12,使用 apt 作为包管理器 +To compile or run SeaSearch on a machine, you need to install the faiss library on that machine. The following are the specific installation steps, which are applicable to x86 linux machines. The operating system used in the process is debian 12, using apt as the package manager. -### 前提条件 +### Prerequisites -通过包管理器安装,如果连接速度慢,可以尝试更换源 +Install through the package manager. If the connection speed is slow, you can try changing the source -ubuntu 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) +ubuntu reference:[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) -debian 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/) +debian reference:[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/) -换源之后,执行 +After changing the source, run ``` sudo apt update ``` -C++ 编译器,支持C++17及以上 +C++ compiler, supports C++17 and above -可以通过 apt 安装 +Can be installed via apt ``` sudo apt install -y gcc ``` -Cmake,3.23.1 以上,如果源不是最新,可以从 ppa 或者源码安装 +Cmake, 3.23.1 or above, if the source is not the latest, you can install it from ppa or source code ``` sudo apt install -y cmake @@ -50,9 +50,9 @@ echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.co sudo apt update && sudo apt install nodejs -y ``` -### 安装 Intel MKL库 (可选,仅支持x86 cpu) +### Install Intel MKL library (optional, only supports x86 CPU) -faiss 依赖 BLAS,并且推荐使用 intel MKL性能最佳 +faiss relies on BLAS, and Intel MKL is recommended for best performance. ``` wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ @@ -65,35 +65,35 @@ sudo apt update sudo apt install -y intel-oneapi-mkl-devel ``` -执行完毕之后,MKL库就安装完毕了,再配置一个环境变量: +After the execution is completed, the MKL library is installed. Then configure an environment variable: ``` export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64 ``` -### 非 x86 cpu 安装BLAS +### Install BLAS on non-x86 CPUs -非x86 cpu无法安装 MKL,可安装 OpenBLAS 实现 +MKL cannot be installed on non-x86 CPUs, but can be instead by installing OpenBLAS ``` sudo apt install -y libatlas-base-dev libatlas3-base ``` -### 编译 faiss +### compile faiss -下载 faiss 源码,ssh方式: +Download faiss source code by ssh: ``` git clone git@github.com:facebookresearch/faiss.git ``` -或者 http方式: +or by https: ``` git clone https://github.com/facebookresearch/faiss.git ``` -进入 faiss 目录,如果安装了 MKL,执行: +If MKL is installed, go to the faiss directory and run: ``` cmake -B build -DFAISS_ENABLE_GPU=OFF \ @@ -105,7 +105,7 @@ cmake -B build -DFAISS_ENABLE_GPU=OFF \ . ``` -如果未安装 MKL,执行: +If MKL is not installed, run: ``` cmake -B build -DFAISS_ENABLE_GPU=OFF \ @@ -116,43 +116,43 @@ cmake -B build -DFAISS_ENABLE_GPU=OFF \ . ``` -执行编译 +Installing the C++ library ``` make -C build ``` -安装头文件: +Installing the headers ``` sudo make -C build install ``` -将编译好的动态链接库,拷贝到系统路径,这里的 /tmp/faiss 是faiss源码路径,替换为真正的路径即可: +Copy the compiled dynamic link library to the system path. Here /tmp/faiss is the faiss source code path. Replace it with the real path: ``` sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib ``` -完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh +For the complete installation script, please refer to /ci/install\_faiss.sh in the SeaSearch project directory. -## 编译 SeaSearch +## Compile SeaSearch -faiss 已经安装完毕,可以开始编译 SeaSearch了 +Faiss has been installed, you can start compiling SeaSearch -首先下载 SeaSearch源码: +Download the SeaSearch source code using ssh: ``` git clone git@github.com:seafileltd/seasearch.git ``` -或者 http方式: +or by https: ``` git clone https://github.com/seafileltd/seasearch.git ``` -编译前端静态文件 +Compile frontend static files ``` cd web @@ -161,79 +161,79 @@ npm install npm run build ``` -安装 go 语言环境 Go 1.20 以上 +Install the go language environment Go 1.20 or above -参考 [https://go.dev/doc/install](https://go.dev/doc/install) +reference [https://go.dev/doc/install](https://go.dev/doc/install) -需要确保启用了 CGO +You need to make sure CGO is enabled ``` export CGO_ENABLED=1 ``` -可选,更换 go 源: +Optional, replace the go source: ``` go env -w GOPROXY=https://goproxy.cn,direct ``` -之后在项目根目录执行: +Run in the project root directory: ``` go build -o seasearch ./cmd/zincsearch/ ``` -以上步骤执行完毕,可以在项目的根目录下面得到最终的 seasearch 二进制文件了。 +After completing the above steps, you can get the final seasearch binary file in the root directory of the project. -一般来说无需手动指定头文件和动态链接库位置,如果编译提示找不到头文件,或者找不到动态运行库,可以在编译时通过环境变量指定位置: +Generally, there is no need to manually specify the location of header files and dynamic link libraries. If the compilation prompt says that the header file cannot be found, or the dynamic runtime library cannot be found, you can specify the location through environment variables during compilation: ``` -CGO_CFLAGS=-I /usr/local/include #你的C +CGO_CFLAGS=-I /usr/local/include # Your default installation path for C header files CGO_LDFLAGS=-I /usr/lib ``` -如果运行时,提示找不到 动态链接库,可以通过: +If the runtime prompts that the dynamic link library cannot be found, you can use: ``` -LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录 +LD_LIBRARY_PATH=/usr/lib # Specify the dynamic link library directory ``` -## 编译 seasearch proxy 和 cluster manger +## Compile seasearch proxy and cluster manger -在集群下,需要编译部署 seasearch proxy 和 cluster manager +In a cluster, you need to compile and deploy seasearch proxy and cluster manager -编译 proxy: +Compile proxy: ``` go build -o seasearch-proxy ./cmd/zinc-proxy/main.go ``` -编译 cluster manager: +Compile cluster manager: ``` go build -o cluster-manager ./cmd/cluster-manager/main.go ``` -## 发布 +## Publish -项目根目录下有 Dokcerfile 文件,可以根据此文件构建 docker 镜像 +There is a Dokcerfile file in the project root directory, and you can build a docker image based on this file -注意:构建此 docker 镜像,需要确保能正常访问 github,否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu,arm 需要设置 platform 参数模拟 x86 +Note: To build this docker image, you need to ensure that you can access github normally, otherwise you will not be able to download the faiss source code, which will cause the build to fail. It only supports x86 cpu, and arm needs to set the platform parameter to simulate x86. ``` docker build -f ./Dockerfile . ``` -## Mac 中存在的安装问题 +## Installation issues on Mac -### faiss 安装 +### faiss installation -faiss 可通过 brew install faiss 安装 +faiss can be installed via brew install faiss. ### fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found -执行如下命令解决: +Execute the following command to solve: source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7) diff --git a/manual/setup/compile_seasearch.md b/manual/setup/compile_seasearch.md deleted file mode 100644 index 3eb85d8..0000000 --- a/manual/setup/compile_seasearch.md +++ /dev/null @@ -1,110 +0,0 @@ - -# 编译 SeaSearch - -faiss 已经安装完毕,可以开始编译 SeaSearch了 - -首先下载 SeaSearch源码: - -``` -git clone git@github.com:seafileltd/seasearch.git -``` - -或者 http方式: - -``` -git clone https://github.com/seafileltd/seasearch.git -``` - -编译前端静态文件 - -``` -cd web -npm config set registry https://registry.npmmirror.com -npm install -npm run build -``` - -安装 go 语言环境 Go 1.20 以上 - -参考 [https://go.dev/doc/install](https://go.dev/doc/install) - -需要确保启用了 CGO - -``` -export CGO_ENABLED=1 -``` - -可选,更换 go 源: - -``` -go env -w GOPROXY=https://goproxy.cn,direct -``` - -之后在项目根目录执行: - -``` -go build -o seasearch ./cmd/zincsearch/ -``` - -以上步骤执行完毕,可以在项目的根目录下面得到最终的 seasearch 二进制文件了。 - -一般来说无需手动指定头文件和动态链接库位置,如果编译提示找不到头文件,或者找不到动态运行库,可以在编译时通过环境变量指定位置: - -``` -CGO_CFLAGS=-I /usr/local/include #你的C -CGO_LDFLAGS=-I /usr/lib -``` - -如果运行时,提示找不到 动态链接库,可以通过: - -``` -LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录 -``` - -# 编译 seasearch proxy 和 cluster manger - -在集群下,需要编译部署 seasearch proxy 和 cluster manager - -编译 proxy: - -``` -go build -o seasearch-proxy ./cmd/zinc-proxy/main.go -``` - -编译 cluster manager: - -``` -go build -o cluster-manager ./cmd/cluster-manager/main.go -``` - - -# 发布 - -项目根目录下有 Dokcerfile 文件,可以根据此文件构建 docker 镜像 - -注意:构建此 docker 镜像,需要确保能正常访问 github,否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu,arm 需要设置 platform 参数模拟 x86 - -``` -docker build -f ./Dockerfile . -``` - -# Mac 中存在的安装问题 - -## faiss 安装 - -faiss 可通过 brew install faiss 安装 - -## fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found - -执行如下命令解决: - -source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7) - -``` -cd faiss -export CMAKE_PREFIX_PATH=/opt/homebrew/opt/openblas:/opt/homebrew/opt/libomp:/opt/homebrew -cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF . -make -C build -sudo make -C build install -sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib/libfaiss_c.dylib -``` diff --git a/manual/setup/install_faiss.md b/manual/setup/install_faiss.md deleted file mode 100644 index 9a1a605..0000000 --- a/manual/setup/install_faiss.md +++ /dev/null @@ -1,133 +0,0 @@ -# 安装 faiss - -要在一台机器上编译或者运行 SeaSearch,需要这台机器安装 faiss 库。下面是具体安装步骤,适用于 x86 linux 机器,流程采用的操作系统为 debian 12,使用 apt 作为包管理器 - -## 前提条件 - -通过包管理器安装,如果连接速度慢,可以尝试更换源 - -ubuntu 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) - -debian 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/) - -换源之后,执行 - -``` -sudo apt update -``` - -C++ 编译器,支持C++17及以上 - -可以通过 apt 安装 - -``` -sudo apt install -y gcc -``` - -Cmake,3.23.1 以上,如果源不是最新,可以从 ppa 或者源码安装 - -``` -sudo apt install -y cmake -``` - -wget swig gnupg libomp - -``` -sudo apt install -y wget swig gnupg libomp-dev -``` - -nodeJs; - -``` -sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg -curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg -NODE_MAJOR=20 -echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list -sudo apt update && sudo apt install nodejs -y -``` - -## 安装 Intel MKL库 (可选,仅支持x86 cpu) - -faiss 依赖 BLAS,并且推荐使用 intel MKL性能最佳 - -``` -wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ -| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null - -echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" |sudo tee /etc/apt/sources.list.d/oneAPI.list - -sudo apt update - -sudo apt install -y intel-oneapi-mkl-devel -``` - -执行完毕之后,MKL库就安装完毕了,再配置一个环境变量: - -``` -export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64 -``` - -## 非 x86 cpu 安装BLAS - -非x86 cpu无法安装 MKL,可安装 OpenBLAS 实现 - -``` -sudo apt install -y libatlas-base-dev libatlas3-base -``` - -## 编译 faiss - -下载 faiss 源码,ssh方式: - -``` -git clone git@github.com:facebookresearch/faiss.git -``` - -或者 http方式: - -``` -git clone https://github.com/facebookresearch/faiss.git -``` - -进入 faiss 目录,如果安装了 MKL,执行: - -``` -cmake -B build -DFAISS_ENABLE_GPU=OFF \ - -DFAISS_ENABLE_C_API=ON \ - -DFAISS_ENABLE_PYTHON=OFF \ - -DBLA_VENDOR=Intel10_64_dyn \ - -DBUILD_SHARED_LIBS=ON \ - "-DMKL_LIBRARIES=-Wl,--start-group;${MKL_PATH}/libmkl_intel_lp64.a;${MKL_PATH}/libmkl_gnu_thread.a;${MKL_PATH}/libmkl_core.a;-Wl,--end-group" \ - . -``` - -如果未安装 MKL,执行: - -``` -cmake -B build -DFAISS_ENABLE_GPU=OFF \ - -DFAISS_ENABLE_C_API=ON \ - -DFAISS_ENABLE_PYTHON=OFF \ - -DBUILD_SHARED_LIBS=ON=ON \ - -DBUILD_TESTING=OFF \ - . -``` - -执行编译 - -``` -make -C build -``` - -安装头文件: - -``` -sudo make -C build install -``` - -将编译好的动态链接库,拷贝到系统路径,这里的 /tmp/faiss 是faiss源码路径,替换为真正的路径即可: - -``` -sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib -``` - -完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh diff --git a/mkdocs.yml b/mkdocs.yml index cdcfdaa..80f8bf0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -45,8 +45,8 @@ markdown_extensions: # Page tree nav: - - Setup: - - Installation of SeaSearch: setup/README.md + - SeaSearch Setup: + - Installation: setup/README.md - Deploy: - Deploy SeaSearch: deploy/README.md - Configuration: config/README.md