-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non aws models - openAI + gemini #206
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also have to make a chance in the metrics calculation notebook because it also does some pricing related calculations.
docs/benchmarking_non_aws_models.md
Outdated
@@ -0,0 +1,27 @@ | |||
# Benchmark non AWS models on FMBench |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is specific to OpenAI and Gemini so I would mention that directly instead of saying non AWS because any 3P or open-source model is non AWS in that sense. Change to Benchmark OpenAI and Gemini models
.
docs/benchmarking_non_aws_models.md
Outdated
@@ -0,0 +1,27 @@ | |||
# Benchmark non AWS models on FMBench | |||
|
|||
This feature enables users to benchmark non AWS models on FMBench, such as OpenAI and Gemini models. Current models that are tested with this feature are: `gpt-4o`, `gpt-4o-mini`, `gemini-1.5-pro` and `gemini-1.5-flash`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FMBench -> FMBench
.
docs/benchmarking_non_aws_models.md
Outdated
@@ -0,0 +1,27 @@ | |||
# Benchmark non AWS models on FMBench | |||
|
|||
This feature enables users to benchmark non AWS models on FMBench, such as OpenAI and Gemini models. Current models that are tested with this feature are: `gpt-4o`, `gpt-4o-mini`, `gemini-1.5-pro` and `gemini-1.5-flash`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...users to benchmark non AWS..." -> "users to benchmark external models such as OpenAI and Gemini models on FMBench
"
docs/benchmarking_non_aws_models.md
Outdated
|
||
### Prerequisites | ||
|
||
To benchmark a non AWS model, the configuration file requires an **API Key**. Mention your custom API key within the `inference_spec` section in the `experiments` within the configuration file. View an example below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API Key -> a model provider provided API Key (such as an OpenAI key or a Gemini key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not configure the key directly, but rather the path to the API key. This should be handled in the same way as we handle the hf_token.txt
to read the HF token.
read_bucket: {read_bucket} | ||
scripts_prefix: scripts ## add your own scripts in case you are using anything that is not on jumpstart | ||
script_files: | ||
- hf_token.txt ## add your scripts files you have in s3 (including inference files, serving stacks, if any) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the path to openai_key.txt
and gemini_key.txt
in this list.
max_length_in_tokens: 6000 | ||
payload_file: payload_en_5000-6000.jsonl | ||
- language: en | ||
min_length_in_tokens: 305 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the 305 to 3997
|
||
|
||
metrics: | ||
dataset_of_interest: en_500-1000 # en_5000-6000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to 3000-4000
inference_script: external_predictor.py | ||
inference_spec: | ||
split_input_and_parameters: no | ||
api_key: <your-api-key> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this parameter, if an external_predictor is being used then it should automatically check if the openai_key.txt or gemini_key.txt is present then set it into env vars, do not need to have this parameter here.
src/fmbench/globals.py
Outdated
@@ -213,7 +213,10 @@ | |||
# token counting logic on the client side (does not impact the tokenizer the model uses) | |||
# NOTE: if tokenizer files are provided in the tokenizer directory then they take precedence | |||
# if the files are not present then we load the tokenizer for this model id from Hugging Face | |||
TOKENIZER_MODEL_ID = config['experiments'][0]['model_id'] | |||
if config['experiments'][0].get('model_id', None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebaseline from main
# The inference format for each option (OpenAI/Gemini) is the same using LiteLLM | ||
# for streaming/non-streaming | ||
# set the environment for the specific model | ||
if 'gemini' in self.endpoint_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just do this based on the presence of a file and not rely on endpoint name.
90950d4
to
daf7a2b
Compare
1a00eb6
to
542179b
Compare
aa98452
to
eefa311
Compare
This PR contains the following: