Skip to content

sfc-gh-zhwang/llm_benchmark

Repository files navigation

llm_benchmark on llama2-70B

Run on A100(40GiB)x8, with tensor para = 8.

batch_size: 32, input_len: 1, output_len: 2048 batch_size: 24, input_len: 1024, output_len: 1024
Huggingface 948.26 s 439.02 s
Triton 70.19 s 38.97 s
vLLM 133.70 s 76.12 s

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages