- Add
hf_tokenizer_model_id
parameter to automatically download tokenizers from Hugging Face.
- Config files for
Llama3.1-1b
on AMD/Intel CPU instance types. - Bug fixes for token counting for vLLM.
- Delete SageMaker endpoint as soon as the run finishes.
- Add support for embedding models through SageMaker jumpstart
- Add support for LLama 3.2 11b Vision Instruct benchmarking through FMBench
- Fix DJL Inference while deploying djl on EC2(424 Inference bug)
- Update to torch 2.4 for compatibility with SageMaker Notebooks.
Llama3.1-70b
config files and more.- Support for
fmbench-orchestrator
.
- Update
pricing.yml
additional config files.
Llama3.2-1b
andLlama3.2-3b
support on EC2 g5.Llama3-8b
on EC2g6e
instances.
- Triton-djl support for AWS Chips.
- Tokenizer files are now downloaded directly from Hugging Face (unless provided manually as before)
- Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
- Misc. bug fixes.
- Run multiple model copies with the DJL serving container and an Nginx load balancer on Amazon EC2.
- Config files for
Llama3.1-8b
ong5
,p4de
andp5
Amazon EC2 instance types. - Better analytics for creating internal leaderboards.
- Support for Intel CPU based instances such as
c5.18xlarge
andm5.16xlarge
.
- Support for AMD CPU based instances such as
m7a
.
- Support for a EFS directory for benchmarking on EC2.
- Code cleanup, minor bug fixes and report improvements.
- 🚨 Model evaluations done by a Panel of LLM Evaluators[1] 🚨
- Compile for AWS Chips (Trainium, Inferentia) and deploy to SageMaker directly through
FMBench
. Llama3.1-8b
andLlama3.1-70b
config files for AWS Chips (Trainium, Inferentia).- Misc. bug fixes.
FMBench
has a website now. Rework the README file to make it lightweight.Llama3.1
config files for Bedrock.
Llama3-8b
on Amazon EC2inf2.48xlarge
config file.- Update to new version of DJL LMI (0.28.0).
- Streaming support for Amazon SageMaker and Amazon Bedrock.
- Per-token latency metrics such as time to first token (TTFT) and mean time per-output token (TPOT).
- Misc. bug fixes.
- Faster result file download at the end of a test run.
Phi-3-mini-4k-instruct
configuration file.- Tokenizer and misc. bug fixes.
- Run
FMBench
as a Docker container. - Bug fixes for GovCloud support.
- Updated README for EKS cluster creation.
- Native model deployment support for EC2 and EKS (i.e. you can now deploy and benchmark models on EC2 and EKS).
- FMBench is now available in GovCloud.
- Update to latest version of several packages.
- Analytics for results across multiple runs.
Llama3-70b
config files forg5.48xlarge
instances.
- Endpoint metrics (CPU/GPU utilization, memory utiliztion, model latency) and invocation metrics (including errors) for SageMaker Endpoints.
Llama3-8b
config files forg6
instances.
- Config file for running
Llama3-8b
on all instance types exceptp5
. - Fix bug with business summary chart.
- Fix bug with deploying model using a DJL DeepSpeed container in the no S3 dependency mode.
- Make it easy to run in the Amazon EC2 without any dependency on Amazon S3 dependency mode.
- Add an internal
FMBench
website.
- Support for running
FMBench
on Amazon EC2 without any dependency on Amazon S3. Llama3-8b-Instruct
config file forml.p5.48xlarge
.
g5
/p4d
/inf2
/trn1
specific config files forLlama3-8b-Instruct
.p4d
config file for bothvllm
andlmi-dist
.
- Fix bug at higher concurrency levels (20 and above).
- Support for instance count > 1.
- Support for Open-Orca dataset and corresponding prompts for Llama3, Llama2 and Mistral.
- Don't delete endpoints for the bring your own endpoint case.
- Fix bug with business summary chart.
-
Report enhancements: New business summary chart, config file embedded in the report, version numbering and others.
-
Additional config files: Meta Llama3 on Inf2, Mistral instruct with
lmi-dist
onp4d
andp5
instances.
- Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
- Misc. bug fixes.