Name		Name	Last commit message	Last commit date
parent directory ..
ext		ext
schemas		schemas
.vespaignore		.vespaignore
README.md		README.md
services.xml		services.xml

README.md

Vespa Vector Streaming Search

This sample application is used to demonstrate vector streaming search with Vespa. This was introduced in Vespa 8.181.15. Read the blog post announcing vector streaming search. See Streaming Search for more details.

The application uses a small synthetic sample of mail documents for two fictive users. The subject and content of a mail are combined and embedded into a 384-dimensional embedding space, using a Bert embedder.

Quick start

The following is a quick recipe for getting started with this application.

Docker Desktop installed and running. 4 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
Alternatively, deploy using Vespa Cloud
Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
Architecture: x86_64 or arm64
Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.

Validate Docker resource settings, should be minimum 4 GB:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the Vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Download this sample application:

$ vespa clone vector-streaming-search my-app && cd my-app

Deploy the application :

$ vespa deploy --wait 300

Deployment note

It is possible to deploy this app to Vespa Cloud.

Feeding sample mail documents

During feeding the subject and content of a mail document are embedded using the Bert embedding model. This is computationally expensive for CPU. For production use cases, use Vespa Cloud with GPU instances and autoscaling enabled.

$ vespa feed ext/docs.json

Query and ranking examples

The following uses Vespa CLI to execute queries. Use -v to see the curl equivalent using HTTP API.

Exact nearest neighbor search

$ vespa query 'yql=select * from sources * where {targetHits:10}nearestNeighbor(embedding,qemb)' \
  'input.query(qemb)=embed(events to attend this summer)' \
  'streaming.groupname=1234'

This searches all documents for user 1234, and returns the ten best documents according to the angular distance between the document embedding and the query embedding.

Exact nearest neighbor search with timestamp filter

$ vespa query 'yql=select * from sources * where {targetHits:10}nearestNeighbor(embedding,qemb) and timestamp >= 1685577600' \
  'streaming.groupname=1234' \
  'input.query(qemb)=embed(events to attend this summer)'

This query only returns documents that are newer than 2023-06-01.

Exact nearest neighbor search with content filter

$ vespa query 'yql=select * from sources * where {targetHits:10}nearestNeighbor(embedding,qemb) and content contains "sofa"' \
  'streaming.groupname=5678' \
  'input.query(qemb)=embed(list all order confirmations)'

This query only returns documents that match "sofa" in the content field.

Cleanup

Tear down the running container:

$ docker rm -f vespa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector-streaming-search

vector-streaming-search

README.md

Vespa Vector Streaming Search

Quick start

Deployment note

Feeding sample mail documents

Query and ranking examples

Exact nearest neighbor search

Exact nearest neighbor search with timestamp filter

Exact nearest neighbor search with content filter

Cleanup

Files

vector-streaming-search

Directory actions

More options

Directory actions

More options

Latest commit

History

vector-streaming-search

Folders and files

parent directory

README.md

Vespa Vector Streaming Search

Quick start

Deployment note

Feeding sample mail documents

Query and ranking examples

Exact nearest neighbor search

Exact nearest neighbor search with timestamp filter

Exact nearest neighbor search with content filter

Cleanup