BERT inference examples and benchmarks for A100 #7350
vadimkantorov
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm looking for a modern basic example/benchmark of BERT inference on Triton inference server (similar to the older https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT/triton and https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/triton/large/README.md#deployment-process <- but they do not include torch.compile with bells/whistles) on a A100 gpu
Some variants that would be interesting:
Does anybody know if it exists? Even the most basic comparison of torch.compile config to a modern TRT would be interesting
Thanks :)
Beta Was this translation helpful? Give feedback.
All reactions