Skip to content

Latest commit

 

History

History
56 lines (40 loc) · 1.69 KB

File metadata and controls

56 lines (40 loc) · 1.69 KB

TorchServe inference with torch._export.aot_compile

This example shows how to run TorchServe with Torch exported model with AOTInductor

To understand when to use torch._export.aot_compile, please refer to this section

Pre-requisites

  • PyTorch >= 2.3.0
  • CUDA >= 11.8

Change directory to the examples directory Ex: cd examples/pt2/torch_export_aot_compile

Create a Torch exported model with AOTInductor

The model is saved with .so extension Here we are torch exporting with AOT Inductor with max_autotune mode. This is also making use of dynamic_shapes to support batch size from 1 to 32. In the code, the min batch_size is mentioned as 2 instead of 1. Its by design. The code works for batch size 1. You can find an explanation for this here

python resnet18_torch_export.py

Create model archive

torch-model-archiver --model-name res18-pt2 --handler image_classifier --version 1.0 --serialized-file resnet18_pt2.so --config-file model-config.yaml --extra-files ../../image_classifier/index_to_name.json
mkdir model_store
mv res18-pt2.mar model_store/.

Start TorchServe

torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs --disable-token-auth  --enable-model-api

Run Inference

curl http://127.0.0.1:8080/predictions/res18-pt2 -T ../../image_classifier/kitten.jpg

produces the output

{
  "tabby": 0.4087875485420227,
  "tiger_cat": 0.34661102294921875,
  "Egyptian_cat": 0.13007202744483948,
  "lynx": 0.024034621194005013,
  "bucket": 0.011633828282356262
}