A sample Vespa application to evaluate models of the application package in Vespa containers.
Please refer to stateless model evaluation for more information.
The directory src/main/application/models
contains two ONNX model files generated
by the PyTorch scripts in the same directory. These two models are used to show
various ways stateless model evaluation can be used in Vespa:
- Vespa can automatically make models available through a REST API.
- In a request handler providing the capability of executing custom code before evaluating a model.
- In searchers and document processors.
- In a post-processing searcher to run a model in batch with the result from the content node.
Requirements:
- Docker Desktop installed and running. 6GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Minimum 4GB memory dedicated to Docker.
- Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
- Java 17 installed.
- Apache Maven This sample app uses custom Java components and Maven is used to build the application.
Validate environment, should be minimum 4G:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
For local deployment using docker image:
$ vespa config set target local
Pull and start the vespa docker container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Download this sample application:
$ vespa clone model-inference myapp && cd myapp
Build the application package:
$ mvn clean package -U
Verify that configuration service (deploy api) is ready:
$ vespa status deploy --wait 300
Deploy the application:
$ vespa deploy --wait 300
It is possible to deploy this app to Vespa Cloud.
Wait for the application endpoint to become available:
$ vespa status --wait 300
Test the application - run Vespa System Tests which executes a set of basic tests to verify that the application is working as expected:
$ vespa test src/test/application/tests/system-test/model-inference-test.json
Using the REST APIs directly In the following examples we use vespa CLI's curl option, it manages the endpoint url, local or cloud.
List the available models:
$ vespa curl /model-evaluation/v1/
Details of model the transformer
model:
$ vespa curl /model-evaluation/v1/transformer
Evaluating the model:
$ vespa curl -- \ --data-urlencode "input=[[1,2,3]]" \ --data-urlencode "format.tensors=short" \ /model-evaluation/v1/transformer/eval
The input here is using literal short form.
Test the application - Java API in a handler
$ vespa curl -- \ --data-urlencode "input={{x:0}:1,{x:1}:2,{x:2}:3}" \ --data-urlencode "model=transformer" \ /models/
The input here is:
{ {x:0}:1, {x:1}:2, {x:2}:3 }
The model expects type tensor(d0[],d1[])
,
but the handler transforms the tensor to the correct shape before evaluating the model.
Test the document processor
Feed documents:
$ vespa feed feed.json
The MyDocumentProcessor document processor
uses the transformer
model to generate embeddings that are stored in the content cluster.
Test the searchers
$ vespa curl -- \ --data-urlencode "input={{x:0}:1,{x:1}:2,{x:2}:3}" \ --data-urlencode "searchChain=mychain" \ /search/
This issues a search for the same input as above:
{ {x:0}:1, {x:1}:2, {x:2}:3 }
The MySearcher searcher
uses the transformer
model to translate to an embedding, which is sent to the backend.
A simple dot product between the embeddings of the query and documents are performed,
and the documents are initially sorted in descending order.
The MyPostProcessingSearcher searcher
uses the pairwise_ranker
model to compare each document against each other,
something that can't be done on the back end before determining the final rank order.
Shutdown and remove the container
$ docker rm -f vespa