Skip to content

Latest commit

 

History

History

model-inference

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
#Vespa

Vespa sample applications - Stateless model evaluation

A sample Vespa application to evaluate models of the application package in Vespa containers.

Please refer to stateless model evaluation for more information.

The directory src/main/application/models contains two ONNX model files generated by the PyTorch scripts in the same directory. These two models are used to show various ways stateless model evaluation can be used in Vespa:

  • Vespa can automatically make models available through a REST API.
  • In a request handler providing the capability of executing custom code before evaluating a model.
  • In searchers and document processors.
  • In a post-processing searcher to run a model in batch with the result from the content node.

Quick Start

Requirements:

  • Docker Desktop installed and running. 6GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Minimum 4GB memory dedicated to Docker.
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
  • Java 17 installed.
  • Apache Maven This sample app uses custom Java components and Maven is used to build the application.

Validate environment, should be minimum 4G:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Download this sample application:

$ vespa clone model-inference myapp && cd myapp

Build the application package:

$ mvn clean package -U

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Deploy the application:

$ vespa deploy --wait 300

Deployment note

It is possible to deploy this app to Vespa Cloud.

Wait for the application endpoint to become available:

$ vespa status --wait 300

Test the application - run Vespa System Tests which executes a set of basic tests to verify that the application is working as expected:

$ vespa test src/test/application/tests/system-test/model-inference-test.json

Using the REST APIs directly In the following examples we use vespa CLI's curl option, it manages the endpoint url, local or cloud.

List the available models:

$ vespa curl /model-evaluation/v1/

Details of model the transformer model:

$ vespa curl /model-evaluation/v1/transformer

Evaluating the model:

$ vespa curl -- \
  --data-urlencode "input=[[1,2,3]]" \
  --data-urlencode "format.tensors=short" \
  /model-evaluation/v1/transformer/eval

The input here is using literal short form.

Test the application - Java API in a handler

$ vespa curl -- \
  --data-urlencode "input={{x:0}:1,{x:1}:2,{x:2}:3}" \
  --data-urlencode "model=transformer" \
  /models/

The input here is:

    { {x:0}:1, {x:1}:2, {x:2}:3 }

The model expects type tensor(d0[],d1[]), but the handler transforms the tensor to the correct shape before evaluating the model.

Test the document processor

Feed documents:

$ vespa feed feed.json

The MyDocumentProcessor document processor uses the transformer model to generate embeddings that are stored in the content cluster.

Test the searchers

$ vespa curl -- \
  --data-urlencode "input={{x:0}:1,{x:1}:2,{x:2}:3}" \
  --data-urlencode "searchChain=mychain" \
  /search/

This issues a search for the same input as above:

    { {x:0}:1, {x:1}:2, {x:2}:3 }

The MySearcher searcher uses the transformer model to translate to an embedding, which is sent to the backend. A simple dot product between the embeddings of the query and documents are performed, and the documents are initially sorted in descending order.

The MyPostProcessingSearcher searcher uses the pairwise_ranker model to compare each document against each other, something that can't be done on the back end before determining the final rank order.

Shutdown and remove the container

$ docker rm -f vespa