RSpec::Llama is a versatile testing framework designed to integrate AI model testing seamlessly into the RSpec ecosystem. Whether you're working with OpenAI's GPT models, Llama, or other AI models, RSpec::Llama simplifies the process of configuring, running, and validating your models' outputs.
- Model Configurations: Easily set up and customize configurations for various AI models like OpenAI, Llama, and Ollama. Configure models with parameters such as temperature, token limits, and stop sequences.
- Model Runners: Seamlessly run AI models using predefined configurations, allowing you to execute prompts and capture their outputs in a simple and consistent way.
- Comprehensive Assertions: Validate model outputs against expected results using advanced matchers, such as
match
,match_all
,match_any
, andmatch_none
, to ensure your models behave as expected in different scenarios. - RSpec Integration: Fully integrated with RSpec, enabling AI model testing to fit naturally into your existing test suite.
Add this line to your application's Gemfile:
gem 'rspec-llama'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install rspec-llama
RSpec::Llama provides a set of helpers to simplify the configuration and interaction with AI models during testing. These helpers allow you to easily define model configurations, runners, and prompts, making it easier to integrate AI models like OpenAI, Llama, and Ollama into your RSpec tests.
The build_model_configuration
helper is used to create a model configuration object for various AI models,
allowing you to specify and customize the parameters that will be used during the model's execution.
This helper supports different types of model configurations, such as OpenAI's GPT models and Llama models.
Parameters:
configuration_type
: Symbol representing the type of model configuration. Possible values are :openai
, :llama_cpp
, and :ollama
.
options
: Hash of configuration options specific to the selected model type.
Supported Options:
model
: The model to use (e.g., 'gpt-3.5-turbo', 'gpt-4').
temperature
: Sampling temperature between 0 and 2. Higher values make output more random.
stop
: Up to 4 sequences where the API will stop generating further tokens.
seed
: A seed for controlling the randomness in generation. This ensures consistent outputs between runs when the same seed and model configuration are used. Useful for debugging and testing.
config = build_model_configuration(:openai, model: 'gpt-4', temperature: 0.7)
# Example usage
runner = build_model_runner(:openai, access_token: ENV['OPENAI_ACCESS_TOKEN'])
prompt = 'What gems does the Rails gem depend on?'
result = runner.call(config, prompt)
expect(result).to match_all(
'activesupport', 'activerecord', 'actionpack', 'actionview',
'actionmailer', 'actioncable', 'railties'
)
The LlamaCpp model configuration is used to set parameters for models running with the llama.cpp implementation.
config = build_model_configuration(:llama_cpp, model: '/path/to/model', temperature: 0.5, predict: 500)
# Example usage
runner = build_model_runner(:llama_cpp, cli_path: '/path/to/llama-cli')
prompt = 'What are the most popular Ruby frameworks?'
result = runner.call(config, prompt)
expect(result).to match_all('Ruby on Rails', 'Sinatra', 'Hanami')
Supported Options:
model
: The path to the model file.
temperature
: Sampling temperature between 0 and 2.
predict
: The number of tokens to predict.
stop
: Regular expression to define where the model should stop generating text.
The Ollama model configuration is similar to the OpenAI and LlamaCpp configurations but tailored to the Ollama models.
config = build_model_configuration(:ollama, model: 'ollama3.1')
# Example usage
runner = build_model_runner(:ollama)
prompt = 'Who created the Ruby programming language?'
result = runner.call(config, prompt)
expect(result).to match_all('Yukihiro', 'Matz', 'Matsumoto')
Supported Options:
model
: The model to use (e.g., 'ollama3.1').
The build_model_runner
helper is used to create a model runner object that interacts with various AI models,
allowing you to execute prompts and retrieve results. This helper supports different types of model runners,
such as those for OpenAI's GPT models, Llama models, and others.
Parameters:
runner_type
: Symbol representing the type of model runner. Possible values are :openai
, :llama_cpp
, and :ollama
.
options
: Hash of options specific to the selected runner type, such as API credentials or executable paths.
The OpenAI model runner interacts with OpenAI's API to execute prompts and retrieve responses from models like GPT-3.5 or GPT-4.
runner = build_model_runner(:openai, access_token: ENV['OPENAI_ACCESS_TOKEN'], organization_id: ENV['OPENAI_ORGANIZATION_ID'])
# Example usage
config = build_model_configuration(:openai, model: 'gpt-4', temperature: 0.7)
prompt = 'What is the capital of France?'
result = runner.call(config, prompt)
puts result.to_s
Supported Options:
access_token
: The API key for accessing OpenAI's API.
organization_id
: (Optional) The ID of your OpenAI organization.
project_id
: (Optional) The ID of your OpenAI project.
coming soon
coming soon
This gem provides RSpec matchers for comparing model outputs, focusing on language models like those from OpenAI and Llama. These matchers help assert the presence or absence of specific strings or patterns in the output, and they are designed to work seamlessly with RSpec's syntax.
The match
matcher is used to check if a string or pattern is present in the actual output.
expect(result).to match('Ruby on Rails')
expect(result).to match(/Rails/)
Passes if: The output contains the exact string or matches the given regular expression.
Fails if: The string or pattern is not present in the output.
The match_all
matcher checks if all the provided strings or patterns are present in the actual output.
expect(result).to match_all('Ruby on Rails', 'Sinatra', 'Hanami')
expect(result).to match_all(/Rails/, /Sinatra/)
Passes if: All strings or patterns are found in the output.
Fails if: Any one of the strings or patterns is missing from the output.
The match_any
matcher checks if any of the provided strings or patterns are present in the actual output.
expect(result).to match_any('RoR', 'Ruby on Rails')
expect(result).to match_any(/Rails/, /Sinatra/)
Passes if: At least one string or pattern is found in the output.
Fails if: None of the strings or patterns are found in the output.
The match_none
matcher ensures that none of the provided strings or patterns are present in the actual output.
expect(result).to match_none('Django', 'Flask', 'Symfony')
expect(result).to match_none(/Django/, /Flask/)
Passes if: None of the strings or patterns are found in the output.
Fails if: Any one of the strings or patterns is found in the output.
Here’s a full example that demonstrates how to use helpers and matchers to test various models with different configurations.
require 'rspec/llama'
RSpec.shared_examples 'application frameworks' do
describe 'popular Ruby frameworks' do
let(:prompt) { 'What are the most popular Ruby frameworks?' }
it 'matches Ruby frameworks', :aggregate_failures do
result = run_model!
expect(result).to match_all('Ruby on Rails', 'Sinatra', 'Hanami')
expect(result).to match_none('Django', 'Flask', 'Symfony', 'Laravel', 'Yii')
end
end
describe 'dependencies for the Rails gem' do
let(:prompt) { 'What gems does the Rails gem depend on?' }
it 'matches Rails dependencies' do
result = run_model!
expect(result).to match_all(
'activesupport', 'activerecord', 'actionpack', 'actionview',
'actionmailer', 'actioncable', 'railties'
)
end
end
describe 'popular Python frameworks' do
let(:prompt) { 'What are the most popular Python frameworks?' }
it 'matches Python frameworks', :aggregate_failures do
result = run_model!
expect(result).to match_all('Django', 'Flask')
expect(result).to match_none('Ruby on Rails', 'Sinatra', 'Hanami', 'Symfony', 'Laravel', 'Yii')
end
end
end
RSpec.describe 'Popular Application Frameworks' do
subject(:run_model!) { runner.call(config, prompt) }
context 'with OpenAI model runner' do
let(:runner) { build_model_runner(:openai, access_token: ENV.fetch('OPENAI_ACCESS_TOKEN')) }
let(:config) { build_model_configuration(:openai, model:, temperature:, seed: RSpec.configuration.seed) }
let(:temperature) { 0.5 }
context 'with gpt-4o-mini model' do
let(:model) { 'gpt-4o-mini' }
include_examples 'application frameworks'
end
context 'with gpt-4o model' do
let(:model) { 'gpt-4o' }
include_examples 'application frameworks'
end
context 'with gpt-4-turbo model' do
let(:model) { 'gpt-4-turbo' }
include_examples 'application frameworks'
context 'with different temperature' do
let(:temperature) { 0.1 }
include_examples 'application frameworks'
end
end
end
end
To contribute to the development of this gem, follow the steps below:
-
Clone the repository:
git clone https://github.com/aifoundry-org/rspec-llama.git cd rspec-llama
-
Install dependencies:
Make sure you have Bundler installed. Then run:
bundle install
-
Run the tests:
This gem uses RSpec for testing. To run the tests:
bundle exec rspec
-
Run Rubocop for code linting
Ensure your code follows the community Ruby style guide by running Rubocop:
bundle exec rubocop
Bug reports and pull requests are welcome on GitHub at https://github.com/aifoundry-org/rspec-llama. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Everyone interacting in the Rspec::Llama project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.