nvidia-triton-inference

This repository contains setup examples for hosting model inference using NVIDIA triton

How to build a triton embedding image

Setup your model and tokenizer files
- move model.onnx to hf-embedding-template/onnx_model/1/
- move any other model files (model and tokenizer config) to hf-embedding-template/preprocessing/1/

Start Triton Server and attach shell

docker run --shm-size=16g --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /hf-embedding-template:/models nvcr.io/nvidia/tritonserver:24.08-py3 bash

Run inside the Triton Container

pip install transformers

tritonserver --model-repository=/models

Run client

pip install tritonclient[http]

python client.py

helm charts

This repo comes with ready to run helm charts. They can be found under /helm. E.g. text-embedder-trion is readily configured to run a triton embedding server.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
helm		helm
hf-embedding-template		hf-embedding-template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
client.py		client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nvidia-triton-inference

How to build a triton embedding image

helm charts

About

Releases

Packages

Contributors 3

Languages

License

deepset-ai/nvidia-triton-inference

Folders and files

Latest commit

History

Repository files navigation

nvidia-triton-inference

How to build a triton embedding image

helm charts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages