Release TorchServe v0.12.0 Release Notes · pytorch/serve

Highlights Include

GenAI updates
- No code LLM deployments with TorchServe + vLLM & TensorRT-LLM using ts.llm_launcher script
- OpenAI API support for TorchServe + vLLM
- Integration of TensorRT-LLM engine
- Stateful Inference on AWS Sagemaker (see blog)
Support for linux-aarch64
- CI & nightly regression added
- Publish docker & KServe images
PyTorch updates
- Support for PyTorch 2.4
- Deprecation of TorchText

PyTorch Updates

upgrade to PyTorch 2.4 & deprecation of TorchText by @agunapal in #3289
Resnet152 batch inference torch.compile example by @andrius-meta in #3259
squeezenet torch.compile example by @wdvr in #3277

GenAI

Implement stateful inference session timeout by @namannandan in #3263
Use Case: Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton by @agunapal in #3276
Feature add openai api for vllm integration by @mreso in #3287
Set vllm multiproc method to spawn by @mreso in #3310
TRT LLM Integration with LORA by @agunapal in #3305
Bump vllm from 0.5.0 to 0.5.5 in /examples/large_models/vllm by @dependabot in #3321
Use startup time in async worker thread instead of worker timeout by @mreso in #3315
Rename vllm dockerfile by @mreso in #3330

Support for `linux-aarch64`

Adding Graviton Regression test CI by @udaij12 in #3273
adding graviton docker image release by @udaij12 in #3313
Fixing kserve nightly for arm64 by @udaij12 in #3319
Docker aarch by @udaij12 in #3323

Documentation

Security doc update by @udaij12 in #3256
Remove compile note for hpu by @RafLit in #3271
doc update of the rag usecase blog by @agunapal in #3280
Add some hints for java devs by @mreso in #3282
add TorchServe with Intel® Extension for PyTorch* guidance by @jingxu10 in #3285
Update quickstart llm docker in serve/readme; added ts.llm_launcher example by @mreso in #3300
typo fixes in HF Transformers example by @EFord36 in #3307
docs: update WaveGlow links by @emmanuel-ferdman in #3317
Fix typo: "a asynchronous" -> "an asynchronous" by @tadayosi in #3314
Fix typo: vesion -> version, succsesfully -> successfully by @tadayosi in #3322

Improvements and Bug Fixing

Bump torchserve from 0.10.0 to 0.11.0 in /examples/large_models/ipex_llm_int8 by @dependabot in #3257
add JDK17 compatible groovy dependency for frontend log4j ScriptFilter by @lanxih in #3235
Leave response and sendError when request is canceled by @slashvar in #3267
add kserve gpu tests by @rohithkrn in #3283
Configurable startup time by @Isalia20 in #3262
Add REPO_URL in Dockerfile to allow docker builds from contributor repos by @mreso in #3291
Fix docker repo url in github action workflow by @mreso in #3293
Fix docker ci repo_url by @mreso in #3294
Fix/docker repo url3 by @mreso in #3297
Remove debug step in docker ci by @mreso in #3298
Fix wild card in extra files by @mreso in #3304
Example to demonstrate building a custom endpoint plugin by @namannandan in #3306
Benchmark fix by @udaij12 in #3316
Update TS version to 0.12.0 by @agunapal in #3318
Clear up neuron cache by @chen3933 in #3326
Fix Dockerfile fore renamed forks by @mreso in #3327
Load all models including targz by @m10an in #3329
fix for snapshot variables missing/null by @udaij12 in #3328

New Contributors

@andrius-meta made their first contribution in #3259
@slashvar made their first contribution in #3267
@RafLit made their first contribution in #3271
@wdvr made their first contribution in #3277
@Isalia20 made their first contribution in #3262
@jingxu10 made their first contribution in #3285
@EFord36 made their first contribution in #3307
@emmanuel-ferdman made their first contribution in #3317
@tadayosi made their first contribution in #3314
@m10an made their first contribution in #3329

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.12.0	2.4.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.11.1	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.11.0	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.12.0	2.1	>=3.8, <=3.11	2.18.2+
0.11.1	2.1	>=3.8, <=3.11	2.18.2+
0.11.0	2.1	>=3.8, <=3.11	2.18.2+
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe v0.12.0 Release Notes

Highlights Include

PyTorch Updates

GenAI

Support for `linux-aarch64`

Documentation

Improvements and Bug Fixing

New Contributors

Platform Support

GPU Support Matrix

Inferentia2 Support Matrix

Contributors

TorchServe v0.12.0 Release Notes

Highlights Include

PyTorch Updates

GenAI

Support for linux-aarch64

Documentation

Improvements and Bug Fixing

New Contributors

Platform Support

GPU Support Matrix

Inferentia2 Support Matrix

Contributors

Support for `linux-aarch64`