Skip to content

Semantic document retrieval with Elasticsearch and sentence-transformers.

License

Notifications You must be signed in to change notification settings

uyaseen/elasticsearch-dense-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elasticsearch-dense-retrieval

This project demonstrates semantic document search (document retrieval) using elasticsearch and sentence-transformers. In contrast to the traditional lexical search (e.g. BM25), the semantic search can tolerate spelling mistakes as vector representations can capture notion of similarity between semantically similar words (thanks to word embeddings!). Elasticsearch 7.3 provides a cosineSimilarity function for vector fields, this enables convenient document retrieval based on vector similarity.

The demo application included in this project can enable the user to search health related questions based on the data scrapped from NHS website.

Usage

1. Download dataset

Download the NHS website data from here and copy the data directory inside the project.

2. Run Docker containers

docker compose up

3. Run pipeline script to create index, process and index documents

conda create -n es python=3.8
conda activate es
pip install -r es/requirements.txt
python es/es_pipeline.py

4. Search

Open the search interface by opening http://localhost:8501/ in your browser.

5. Improvements

If you are not happy with the results, try experimenting with pretrained models relevant to your domain, or consider adapting the "general domain" models to your target domain.

Acknowledgements

I consulted the codebase of bertsearch & pinecone-io/examples for this project.