LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs). It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.
LLMeter requires python>=3.10
, please make sure your current version of python is compatible.
To install the basic metering functionalities, you can install the minimum package using pip install:
pip install llmeter
LLMeter also offers extra features that require additional dependencies. Currently these extras include:
- plotting: Add methods to generate charts and heatmaps to summarize the results
- openai: Enable testing endpoints offered by OpenAI
- litellm: Enable testing a range of different models through LiteLLM
- mlflow: Enable logging LLMeter experiments to MLFlow
You can install one or more of these extra options using pip:
pip install 'llmeter[plotting,openai,litellm,mlflow]'
At a high level, you'll start by configuring an LLMeter "Endpoint" for whatever type of LLM you're connecting to:
# For example with Amazon Bedrock...
from llmeter.endpoints import BedrockConverse
endpoint = BedrockConverse(model_id="...")
# ...or OpenAI...
from llmeter.endpoints import OpenAIEndpoint
endpoint = OpenAIEndpoint(model_id="...", api_key="...")
# ...or via LiteLLM...
from llmeter.endpoints import LiteLLM
endpoint = LiteLLM("{provider}/{model_id}")
# ...and so on
You can then run the high-level "experiments" offered by LLMeter:
# For example a heatmap of latency by input & output token count:
from llmeter.experiments import LatencyHeatmap
latency_heatmap = LatencyHeatmap(
endpoint=endpoint,
clients=10,
source_file="examples/MaryShelleyFrankenstein.txt",
...
)
heatmap_results = await latency_heatmap.run()
latency_heatmap.plot_heatmap()
# ...Or testing how throughput varies with concurrent request count:
from llmeter.experiments import LoadTest
sweep_test = LoadTest(
endpoint=endpoint,
payload={...},
sequence_of_clients=[1, 5, 20, 50, 100, 500],
)
sweep_results = await sweep_test.run()
sweep_test.plot_sweep_results()
Alternatively, you can use the low-level llmeter.runner.Runner
class to run and analyze request
batches - and build your own custom experiments.
Additional functionality like cost modelling and MLFlow experiment tracking is enabled through llmeter.callbacks
, and you can write your own callbacks to hook other custom logic into LLMeter test runs.
For more details, check out our selection of end-to-end code examples in the examples folder!
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.