Deploying a TensorFlow Model

This README showcases how to deploy a simple ResNet model on Triton Inference Server.

Step 1: Export the model

Export a TensorFlow model as a saved model.

# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Tensorflow
# container; eg. 22.04

docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:<xx.xx>-tf2-py3
python export.py

Step 2: Set Up Triton Inference Server

To use Triton, we need to build a model repository. The structure of the repository as follows:

model_repository
|
+-- resnet50
    |
    +-- config.pbtxt
    +-- 1
        |
        +-- model.savedmodel
            |
            +-- saved_model.pb
            +-- variables
                |
                +-- variables.data-00000-of-00001
                +-- variables.index

A sample model configuration of the model is included with this demo as config.pbtxt. If you are new to Triton, it is highly recommended to review Part 1 of the conceptual guide.

docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models --backend-config=tensorflow,version=2

Step 3: Using a Triton Client to Query the Server

Install dependencies & download an example image to test inference.

docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk bash
pip install --upgrade tensorflow
pip install image

wget  -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"

Building a client requires three basic points. Firstly, we setup a connection with the Triton Inference Server.

triton_client = httpclient.InferenceServerClient(url="localhost:8000")

Secondly, we specify the names of the input and output layer(s) of our model.

inputs = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
inputs.set_data_from_numpy(transformed_img, binary_data=True)

output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)

Lastly, we send an inference request to the Triton Inference Server.

# Querying the server
results = triton_client.infer(model_name="resnet50", inputs=[inputs], outputs=[output])
predictions = results.as_numpy('predictions')
print(predictions)

The output of the same should look like below:

[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
 b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
 b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
 b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
 ...
 b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
 b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']

The output format here is <confidence_score>:<classification_index>. To learn how to map these to the label names and more, refer to our documentation. The client code above is available in client.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deploying a TensorFlow Model

Step 1: Export the model

Step 2: Set Up Triton Inference Server

Step 3: Using a Triton Client to Query the Server

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deploying a TensorFlow Model

Step 1: Export the model

Step 2: Set Up Triton Inference Server

Step 3: Using a Triton Client to Query the Server