-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent inference failure of TensorRT 8.6.1 when running open_clip visual model tensorrt engine on GPU A100 #3967
Comments
what is your trt infer code ? |
infer code like this. I didn't set up any multi-process or multi-threaded inference operations in the code.:
TensorRTModel
|
If your all requests are send to one process(include trt infer) is no problem. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
I compiled the image part of the open_clip model (a PyTorch model,https://github.com/mlfoundations/open_clip) in a Python environment using TensorRT 8.6.1, and obtained an engine. Then, I developed a service that loads the TensorRT engine, accepts HTTP POST requests, performs inference, and returns results. This service is written in Python, not C++. Here are the phenomena I observed:
Environment
TensorRT Version: python==3.8,tensorrt==8.6.1
NVIDIA GPU:A 100,80G
NVIDIA Driver Version:535.54.03
CUDA Version:12.2
CUDNN Version:
Operating System:ubuntu 20.04
Python Version (if applicable):3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable):1.13.1+cu116
Baremetal or Container (if so, version):
Relevant Files
Model link:https://github.com/mlfoundations/open_clip
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:No
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: