Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize object (de)-serialization in LLMeter #7

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

athewsey
Copy link
Contributor

Issue #, if available: N/A

While exploring possible cost model implementations, I was slowed/frustrated by the lack of standardisation in how LLMeter serializes objects to file & loads them back.

There are various objects we'd like to be able to save to file and reconstruct: including data like Run Results, RunConfigs, individual InvocationResponses - but also more complex, implementation-dependent objects like an Endpoint, Tokenizer, and hopefully maybe one day a cost model?

Description of changes:

This change proposes to define a standard from_dict(), from_json(), from_file(), to_dict(), to_json(), to_file() interface (see serde.py), stick to it as closely as possible, and re-use implementations as much as possible between serializable objects in LLMeter. In particular at a high level, it proposes:

  1. To annotate dicts created by this interface with a _type attribute, normally corresponding to the __class__.__name__.
  2. Class mappings as a method for users to inject their custom classes when loading something whose type is not known up-front. For example: Endpoint.from_file("endpoint.json", alt_classes={"MyCoolGeminiEndpoint": MyCoolGeminiEndpoint}), where endpoint.json could also refer to one of the LLMeter built-in types
  3. To standardize Tokenizer to the same unknown-implementation-loading pattern as Endpoint, because they're solving the same problem (defining a few built-in implementations, but wanting flexible support for custom ones too)
  4. To standardize on JSONable classes nesting their JSONable fields directly in their output JSON by default, and treat separating things out as an override behaviour. For example: A cost model would serialize to one JSON file even though it's composed of multiple (built-in or custom-defined) cost dimensions... But there are some cases (like RunConfig.responses) that might continue to be extracted to avoid undue bulk.

This is an early draft: I haven't narrowed down the exact extent of breaking change in file output formats yet, or fixed tests, or added tests for the new components... But raising for visibility so we can discuss 😄


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Initial draft to standardise how we de/serialise objects across
LLMeter: From endpoint configurations, to tokenizers, and test
results.

BREAKING CHANGES to various save & load methods to drive consistency.
Not yet updated tests or worked through full scope of breaking change
to communicate in release note.
@athewsey athewsey requested a review from acere November 22, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant