Skip to content

Commit

Permalink
GITBOOK-171: bulk runner revisions
Browse files Browse the repository at this point in the history
  • Loading branch information
Ambika Joshi authored and gitbook-bot committed Sep 11, 2024
1 parent afb7777 commit c01efa1
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 3 deletions.
6 changes: 3 additions & 3 deletions guides/understanding-bulk-runner-and-evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ The iterative bulk runs and systematic comparison provide a framework for improv

## When do you use Bulk vs Bulk+Eval vs Eval? <a href="#z8de5cdl1xxq" id="z8de5cdl1xxq"></a>

* **Bulk workflow only** - If you want to test your Copilot’s functionality for regression tests, monitoring and observability, and bugs.
* **Bulk and Eval** - If you are testing improvements on your prompts, or updating your documents, want to consider A/B testing.
* **Eval** **workflow only**- If you already have test data and want to use “LLM as Judge” to evaluate it
* [**Bulk workflow only**](https://gooey.ai/bulk/farmerchat-bulk-evaluator-regression-only-ggzy9gld1eae/) - If you want to test your Copilot’s functionality for regression tests, monitoring and observability, and bugs.
* [**Bulk and Eval** ](https://gooey.ai/bulk/farmerchat-bulk-evaluator-gpt-4o-mixtral-claude-vs-gemini-pro-15-b0o8aos3rj8y/)- If you are testing improvements on your prompts, or updating your documents, want to consider A/B testing.
* [**Eval** **workflow only**](https://gooey.ai/eval/copilot-evaluator-artpuhzwvily/)- If you already have test data and want to use “LLM as Judge” to evaluate it

### Common terms <a href="#id-3yvzoyislzdo" id="id-3yvzoyislzdo"></a>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

In this example scenario, we are setting up a simple bulk run to check regression for an AI Copilot in production.

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Check out the example run here: BULK RUNNER (Regression Only)</strong></td><td><a href="https://gooey.ai/bulk/farmerchat-bulk-evaluator-regression-only-ggzy9gld1eae/">https://gooey.ai/bulk/farmerchat-bulk-evaluator-regression-only-ggzy9gld1eae/</a></td></tr><tr><td><strong>Check out the example run here: BULK RUNNER (Bulk and Evaluation)</strong></td><td><a href="https://gooey.ai/bulk/farmerchat-bulk-evaluator-gpt-4o-mixtral-claude-vs-gemini-pro-15-b0o8aos3rj8y/">https://gooey.ai/bulk/farmerchat-bulk-evaluator-gpt-4o-mixtral-claude-vs-gemini-pro-15-b0o8aos3rj8y/</a></td></tr></tbody></table>

### Step 1: Select Gooey Workflows <a href="#jmvc9vjmbif9" id="jmvc9vjmbif9"></a>

Choose the “SAVED” run from Gooey.AI Workflows that you would like to use.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

In this example scenario, we are comparing and evaluating the quality of the answers of various AI Copilots that have all the same settings and functionalities except for different LLMs.&#x20;

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Check out the example run here: Evaluation only</strong> </td><td><a href="https://gooey.ai/eval/copilot-evaluator-artpuhzwvily/">https://gooey.ai/eval/copilot-evaluator-artpuhzwvily/</a></td></tr></tbody></table>

### Step 1: Select Gooey Workflows to evaluate <a href="#mj1hmvoaayxg" id="mj1hmvoaayxg"></a>

Choose the “SAVED” run from Gooey.AI Workflows that you would like to use.
Expand Down

0 comments on commit c01efa1

Please sign in to comment.