Releases · Azure-Samples/ai-rag-chat-evaluator

05 Jun 18:28

2024-06-05

3aa7f95

2024-06-05: Update to new AI Chat Protocol, increase flexibility Latest

Latest

This release both updates the evaluator tool to assume that the chat backend conforms to the new Microsoft AI Chat Protocol but also adds two new properties to the config JSON to allow for use with backends using the older protocol or other protocols.

Just add these fields and customize the JMESPath expressions as needed:

    "target_response_answer_jmespath": "choices[0].message.content",
    "target_response_context_jmespath": "choices[0].context.data_points.text"

What's Changed

Bump promptflow-evals from 0.2.0.dev0 to 0.3.0 by @dependabot in #88
Avoid rate-limiting, improve --changed by @pamelafox in #91
Update to new Protocol response format, use JMESPath expressions by @pamelafox in #92

Full Changelog: 2024-05-13...2024-06-05

Contributors

pamelafox and dependabot

Assets 2

13 May 20:16

pamelafox

2024-05-13

1bbec13

2024-05-13: Updated underlying SDK, new metrics

This release ports this tool to use the promptflow-evals SDK for the evaluation functionality, as the evaluate functionality is being deprecated in azure-ai-generative. The Q&A generation is still in azure-ai-generative for now.

Some user-facing changes:

I renamed the custom metrics to "mygroundedness", "myrelevance", "mycoherence" to make it clear they're not the built-in metrics. If you previously generated custom metrics, you'll want to rename keys in evalresults.json to the keys above and rename requested metrics in your config.json file.
I added more built-in metrics from the promptflow-evals SDK: fluency, similarity, f1score.

What's Changed

Add citation_match metric, changed argument, answer reviewer by @pamelafox in #57
Fixing typo in readme by @codemillmatt in #59
Improving OpenAI support by @pamelafox in #62
Pin promptflow by @pamelafox in #65
Bump azure-ai-generative[evaluate] from 1.0.0b7 to 1.0.0b8 by @dependabot in #64
Better debugging for evaluate command by @pamelafox in #69
Adding azd GitHub actions workflow by @pamelafox in #70
Fixes for running on GH by @pamelafox in #71
Add optional target URL to evaluate command and CI to test evaluate by @pamelafox in #72
Add more OSes to matrix by @pamelafox in #73
Fix target URL name by @pamelafox in #74
Add keyring setup to CI by @pamelafox in #75
Remove unused infrastructure, disable keyed access for OpenAI service by @pamelafox in #83
Port to new promptflow-evals SDK by @pamelafox in #85

New Contributors

@codemillmatt made their first contribution in #59

Full Changelog: 2024-03-05...2024-05-13

Contributors

pamelafox, codemillmatt, and dependabot

Assets 2

05 Mar 19:51

pamelafox

2024-03-05

e94223b

2024-03-05: Diff tool for single run

This PR makes the diff tool more flexible, so that you can now specify a single directory and it will diff against the ground truth answer in that case:

For example:

python -m review_tools diff example_results/baseline/

What's Changed

Handle single directory for diff tool by @pamelafox in #55

Full Changelog: 2024-03-04...2024-03-05

Contributors

pamelafox

Assets 2

04 Mar 23:41

pamelafox

2024-03-04

25b06de

2024-03-04: Evaluate "I don't know" situations

The tools now support evaluating your app's ability to say "I don't know". See README:
https://github.com/Azure-Samples/ai-rag-chat-evaluator?tab=readme-ov-file#measuring-apps-ability-to-say-i-dont-know

There's also a new metric, citationmatch, to check whether an answer's citations contains the original citation from the ground truth answer.

What's Changed

Handle failures with numeric ratings by @pamelafox in #53
Added citationmatch metric and tools for evaluating "I don't know" answers by @pamelafox in #54

Full Changelog: 2024-02-15...2024-03-04

Contributors

pamelafox

Assets 2

16 Feb 06:32

pamelafox

2024-02-15

a7e5717

2024-02-15: Upgrade azure-ai-generative SDK, custom Prompt metrics

This week's release upgraded the azure-ai-generative SDK version, which introduced a regression that's now fixed.

The evaluator tool also now has the ability to run custom prompt metrics, which is particularly helpful if you need to localize the built-in prompts. See documentation here:
https://github.com/Azure-Samples/ai-rag-chat-evaluator/tree/main?tab=readme-ov-file#custom-metrics

What's Changed

Bump actions/setup-python from 4 to 5 by @dependabot in #1
Fix env var to match service setup by @pamelafox in #8
Fix keys by @pamelafox in #14
Support for making a directory by @pamelafox in #15
Check for git directory before pre-commit by @pamelafox in #16
Remove pre-commit from devcontainer.json by @pamelafox in #21
Readme improvements by @pamelafox in #20
Update requirements by @pamelafox in #22
Pin AzureML metrics by @pamelafox in #25
Address feedback from Chris by @pamelafox in #28
Rename parameters json by @pamelafox in #30
Better error messages, tests, and encoding by @pamelafox in #34
Add english-only notes to readme by @pamelafox in #36
Make numquestions an optional argument by @pamelafox in #39
Fix numquestions correctly by @pamelafox in #40
Add test call to GPT deployment by @pamelafox in #41
Upgrade sdk, add latency by @pamelafox in #45
Make windows-friendly adjustments to README by @pamelafox in #46
Support/prioritize local prompt metrics by @pamelafox in #50
Fix Azure OpenAI bug with API key by @sofyanajridi in #51
Fix data mapping to match new evaluate SDK expectations by @pamelafox in #52

New Contributors

@dependabot made their first contribution in #1
@pamelafox made their first contribution in #8
@sofyanajridi made their first contribution in #51

Full Changelog: https://github.com/Azure-Samples/ai-rag-chat-evaluator/commits/2024-02-15

Contributors

pamelafox, dependabot, and sofyanajridi

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: Azure-Samples/ai-rag-chat-evaluator

2024-06-05: Update to new AI Chat Protocol, increase flexibility

What's Changed

Contributors

2024-05-13: Updated underlying SDK, new metrics

What's Changed

New Contributors

Contributors

2024-03-05: Diff tool for single run

What's Changed

Contributors

2024-03-04: Evaluate "I don't know" situations

What's Changed

Contributors

2024-02-15: Upgrade azure-ai-generative SDK, custom Prompt metrics

What's Changed

New Contributors

Contributors