GITBOOK-154: Ambika's Aug 22 changes

GooeyAI · Aug 27, 2024 · 99cc9f5 · 99cc9f5
1 parent c0bc0f6
commit 99cc9f5
Show file tree

Hide file tree

Showing 61 changed files with 246 additions and 62 deletions.
diff --git a/.gitbook/assets/0 (1) (1) (1).png b/.gitbook/assets/0 (1) (1) (1).png
diff --git a/.gitbook/assets/0 (1) (1).png b/.gitbook/assets/0 (1) (1).png
diff --git a/.gitbook/assets/0 (1).png b/.gitbook/assets/0 (1).png
diff --git a/.gitbook/assets/0.png b/.gitbook/assets/0.png
diff --git a/.gitbook/assets/1 (1) (1).png b/.gitbook/assets/1 (1) (1).png
diff --git a/.gitbook/assets/1 (1).png b/.gitbook/assets/1 (1).png
diff --git a/.gitbook/assets/1.png b/.gitbook/assets/1.png
diff --git a/.gitbook/assets/10 (1).png b/.gitbook/assets/10 (1).png
diff --git a/.gitbook/assets/10.png b/.gitbook/assets/10.png
diff --git a/.gitbook/assets/11 (1).png b/.gitbook/assets/11 (1).png
diff --git a/.gitbook/assets/11.png b/.gitbook/assets/11.png
diff --git a/.gitbook/assets/12 (1).png b/.gitbook/assets/12 (1).png
diff --git a/.gitbook/assets/12.png b/.gitbook/assets/12.png
diff --git a/.gitbook/assets/13 (1).png b/.gitbook/assets/13 (1).png
diff --git a/.gitbook/assets/13.png b/.gitbook/assets/13.png
diff --git a/.gitbook/assets/14 (1).png b/.gitbook/assets/14 (1).png
diff --git a/.gitbook/assets/14.png b/.gitbook/assets/14.png
diff --git a/.gitbook/assets/15 (1).png b/.gitbook/assets/15 (1).png
diff --git a/.gitbook/assets/15.png b/.gitbook/assets/15.png
diff --git a/.gitbook/assets/16 (1).png b/.gitbook/assets/16 (1).png
diff --git a/.gitbook/assets/16.png b/.gitbook/assets/16.png
diff --git a/.gitbook/assets/2 (1) (1).png b/.gitbook/assets/2 (1) (1).png
diff --git a/.gitbook/assets/2 (1).png b/.gitbook/assets/2 (1).png
diff --git a/.gitbook/assets/2.png b/.gitbook/assets/2.png
diff --git a/.gitbook/assets/3 (1) (1).png b/.gitbook/assets/3 (1) (1).png
diff --git a/.gitbook/assets/3 (1).png b/.gitbook/assets/3 (1).png
diff --git a/.gitbook/assets/3.png b/.gitbook/assets/3.png
diff --git a/.gitbook/assets/4 (1) (1).png b/.gitbook/assets/4 (1) (1).png
diff --git a/.gitbook/assets/4 (1).png b/.gitbook/assets/4 (1).png
diff --git a/.gitbook/assets/4.png b/.gitbook/assets/4.png
diff --git a/.gitbook/assets/5 (1) (1).png b/.gitbook/assets/5 (1) (1).png
diff --git a/.gitbook/assets/5 (1).png b/.gitbook/assets/5 (1).png
diff --git a/.gitbook/assets/5.png b/.gitbook/assets/5.png
diff --git a/.gitbook/assets/6 (1) (1).png b/.gitbook/assets/6 (1) (1).png
diff --git a/.gitbook/assets/6 (1).png b/.gitbook/assets/6 (1).png
diff --git a/.gitbook/assets/6.png b/.gitbook/assets/6.png
diff --git a/.gitbook/assets/7 (1) (1).png b/.gitbook/assets/7 (1) (1).png
diff --git a/.gitbook/assets/7 (1).png b/.gitbook/assets/7 (1).png
diff --git a/.gitbook/assets/7.png b/.gitbook/assets/7.png
diff --git a/.gitbook/assets/8 (1).png b/.gitbook/assets/8 (1).png
diff --git a/.gitbook/assets/8.png b/.gitbook/assets/8.png
diff --git a/.gitbook/assets/9 (1).png b/.gitbook/assets/9 (1).png
diff --git a/.gitbook/assets/9.png b/.gitbook/assets/9.png
diff --git a/.gitbook/assets/Bulk Runner.png_400x400.png b/.gitbook/assets/Bulk Runner.png_400x400.png
diff --git a/.gitbook/assets/Screenshot 2024-08-19 at 11.30.43 PM (1).png b/.gitbook/assets/Screenshot 2024-08-19 at 11.30.43 PM (1).png
diff --git a/.gitbook/assets/Screenshot 2024-08-19 at 11.30.43 PM.png b/.gitbook/assets/Screenshot 2024-08-19 at 11.30.43 PM.png
diff --git a/.gitbook/assets/Screenshot 2024-08-19 at 4.53.02 PM.png b/.gitbook/assets/Screenshot 2024-08-19 at 4.53.02 PM.png
diff --git a/.gitbook/assets/Understanding Bulk run and Evaluations (1).jpg b/.gitbook/assets/Understanding Bulk run and Evaluations (1).jpg
diff --git a/.gitbook/assets/Understanding Bulk run and Evaluations.jpg b/.gitbook/assets/Understanding Bulk run and Evaluations.jpg
diff --git a/.gitbook/assets/W.I.9.png_400x400.png b/.gitbook/assets/W.I.9.png_400x400.png
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -11,7 +11,6 @@
   * [Prepare Synthetic Data](guides/copilot/prepare-synthetic-data.md)
   * [Craft your AI Copilot's personality](guides/copilot/craft-your-ai-copilots-personality.md)
   * [Advanced Settings](guides/copilot/advanced-settings.md)
-  * [Bulk Evaluation](guides/copilot/bulk-evaluation.md)
   * [Conversation Analysis](guides/copilot/conversation-analysis.md)
   * [Deploy to Slack](guides/copilot/deploy-to-slack.md)
   * [Deploy to Facebook](guides/copilot/deploy-to-facebook.md)
@@ -21,7 +20,7 @@
   * [📢 Broadcast Messages (via web or API)](guides/copilot/broadcast-messages-via-web-or-api.md)
 * [🎞️ How to create AI Animations?](guides/how-to-create-ai-animations.md)
 * [🖼️ Create an AI Image with text](guides/create-an-ai-image-with-text/README.md)
-  * [AI Image Prompting ](https://docs.google.com/presentation/d/1RaoMP0l7FnBZovDAR42zVmrUND9W5DW6eWet-pi6kiE/edit#slide=id.p)
+  * [AI Image Prompting](https://docs.google.com/presentation/d/1RaoMP0l7FnBZovDAR42zVmrUND9W5DW6eWet-pi6kiE/edit#slide=id.p)
   * [API Tips for AI Image Generator](guides/create-an-ai-image-with-text/api-tips-for-ai-image-generator.md)
 * [👄 How to use AI Lip Sync Generator?](guides/how-to-use-ai-lip-sync-generator/README.md)
   * [Lip Sync Animation Generator (WITH AUDIO FILES)](guides/how-to-use-ai-lip-sync-generator/lip-sync-animation-generator-with-audio-files.md)
@@ -39,6 +38,9 @@
 * [🧩 How to use Gooey Functions?](guides/how-to-use-gooey-functions.md)
 * [How to use Compare AI Translations?](guides/how-to-use-compare-ai-translations/README.md)
   * [Google Translate Glossary](guides/how-to-use-compare-ai-translations/google-translate-glossary.md)
+* [⚖️ Understanding Bulk Runner and Evaluation](guides/understanding-bulk-runner-and-evaluation/README.md)
+  * [💪 How to set up Bulk Runner?](guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md)
+  * [🕵️‍♀️ How to set up Evaluations?](guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md)
 
 ## 😇 CONTRIBUTING
 

diff --git a/guides/copilot/bulk-evaluation.md b/guides/copilot/bulk-evaluation.md
diff --git a/guides/copilot/deploy-on-whatsapp.md b/guides/copilot/deploy-on-whatsapp.md
@@ -15,15 +15,15 @@ description: One-click integration for your AI Copilot
 
 Click on the [Integrations tab](https://gooey.ai/copilot/integrations/) in the copilot workflow
 
-![](<../../.gitbook/assets/0 (1).png>)
+![](<../../.gitbook/assets/0 (1) (1).png>)
 
 * Use the “WhatsApp” button
 * You’ll be redirected to Facebook Login Page
 * Follow the instructions on the Facebook page
 
 ### Step 1 - Fill Business Information
 
-<figure><img src="../../.gitbook/assets/1.png" alt=""><figcaption></figcaption></figure>
+<figure><img src="../../.gitbook/assets/1 (1).png" alt=""><figcaption></figcaption></figure>
 
 ### Step 2 - Choose your business account (or create a new one)
 

diff --git a/guides/generate-people-also-ask-seo-content.md b/guides/generate-people-also-ask-seo-content.md
@@ -16,7 +16,7 @@ With this approach, you can generate well-cited, authoritative content that boos
 
 Link: [https://gooey.ai/related-qna-maker/](https://gooey.ai/related-qna-maker/)
 
-![](<../.gitbook/assets/1 (1).png>)
+![](<../.gitbook/assets/1 (1) (1).png>)
 
 1. Add your google search query, in this example we have searched for “jon snow” the character from “Game of Thrones”. You can optionally search specific sites. Here we have used “[https://fandom.com](https://fandom.com/)”
 2. Hit submit!

diff --git a/guides/how-to-create-seo-optimized-content-with-ai.md b/guides/how-to-create-seo-optimized-content-with-ai.md
@@ -28,7 +28,7 @@ Gooey.AI provides pre-filled instructions, but you can tailor them to your speci
 
 You can use the “web search tools” based on your target demographic’s geography.
 
-![](../.gitbook/assets/2.png)
+![](<../.gitbook/assets/2 (1).png>)
 
 ### 3. Review and Adjust SEO Content
 

diff --git a/guides/how-to-use-ai-lip-sync-generator/README.md b/guides/how-to-use-ai-lip-sync-generator/README.md
@@ -88,7 +88,7 @@ Try it here:
 
 You can use the “Face Padding” settings to improve the accuracy of the detected face in the image/video. This ensures that the Lip Sync video looks more realistic.
 
-![](<../../.gitbook/assets/2 (1).png>)
+![](<../../.gitbook/assets/2 (1) (1).png>)
 
 #### &#x20;<a href="#id-5272lwq3flrn" id="id-5272lwq3flrn"></a>
 
@@ -108,7 +108,7 @@ How to use the settings:
 
 Note: If you are looking for consistent, long-form speech across many languages, then Google is an excellent choice. But, the voice will sound a little robotic, and many not work for uses that require expressive and emotional speech synthesis.
 
-![](../../.gitbook/assets/3.png)
+![](<../../.gitbook/assets/3 (1).png>)
 
 **ElevenLabs**
 
@@ -128,21 +128,21 @@ You can learn more about custom voice settings here
 
 {% embed url="https://gooey.ai/docs/guides/lipsync-videos-with-custom-voices" %}
 
-![](../../.gitbook/assets/4.png)
+![](<../../.gitbook/assets/4 (1).png>)
 
-![](../../.gitbook/assets/5.png)
+![](<../../.gitbook/assets/5 (1).png>)
 
 **UberDuck.AI**
 
 UberDuck offers low-latency text-to-speech generation.&#x20;
 
-![](../../.gitbook/assets/6.png)
+![](<../../.gitbook/assets/6 (1).png>)
 
 **Bark (Suno.AI)**
 
 Bark is also a great service with several voice options. You can find all the various voice samples [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
 
-![](../../.gitbook/assets/7.png)
+![](<../../.gitbook/assets/7 (1).png>)
 
 #### Speech Provider Samples <a href="#nvegkpa38hjm" id="nvegkpa38hjm"></a>
 

diff --git a/...w-to-use-ai-lip-sync-generator/lip-sync-animation-generator-with-audio-files.md b/...w-to-use-ai-lip-sync-generator/lip-sync-animation-generator-with-audio-files.md
@@ -78,4 +78,4 @@ Try it here:
 
 You can use the “Face Padding” settings to improve the accuracy of the detected face in the image/video. This ensures that the Lip Sync video looks more realistic.
 
-![](<../../.gitbook/assets/2 (1).png>)
+![](<../../.gitbook/assets/2 (1) (1).png>)
diff --git a/guides/how-to-use-compare-ai-translations/google-translate-glossary.md b/guides/how-to-use-compare-ai-translations/google-translate-glossary.md
@@ -12,7 +12,7 @@ These sheets contain the glossary information for Google Translate translations
 
 This sheet contains the glossary terms. The first row should be [ISO-639](https://wikipedia.org/wiki/ISO\_639) or [BCP-47](https://tools.ietf.org/html/bcp47) language codes. Two extra columns are allowed: “pos” to specify part of speech and “description” (these columns are currently ignored by the Google Translate API but may be used in the future):
 
-<figure><img src="../../.gitbook/assets/0.png" alt=""><figcaption></figcaption></figure>
+<figure><img src="../../.gitbook/assets/0 (1).png" alt=""><figcaption></figcaption></figure>
 
 Each subsequent row is then one glossary term in multiple languages. Read more [here](https://cloud.google.com/translate/docs/advanced/glossary#translate\_v3\_translate\_text\_with\_glossary-drest). Changes made to this file are automatically uploaded to a Google Cloud Bucket to be used as a translation Glossary in gooey-server Google Translate requests.
 

diff --git a/guides/understanding-bulk-runner-and-evaluation/README.md b/guides/understanding-bulk-runner-and-evaluation/README.md
@@ -0,0 +1,104 @@
+# ⚖️ Understanding Bulk Runner and Evaluation
+
+## Why do you need bulk runner and evaluations? <a href="#id-4zynvpxsa8kj" id="id-4zynvpxsa8kj"></a>
+
+When building your Gooey.AI workflows, you will have to tweak the settings often to ensure the responses show parity and are grounded and verifiable.
+
+**There are several components to test:**
+
+* testing prompts
+* ensuring the synthetic data retrieval works
+* checking the suitability of the language model and its advanced settings
+* Latency of generated answers
+* evaluation of the final AI Copilot to produce the Golden Answers
+* evaluation of the price per run
+* regression tests
+
+How can you do this at scale?
+
+**This is where Gooey.AI’s Bulk and Evaluation features shine!**
+
+## Features of Bulk Runner and Evaluation <a href="#eheq9i411cm3" id="eheq9i411cm3"></a>
+
+* Run several models in one click
+* Run several iterations of your workflows at scale
+* Choose any of the API Response Outputs to populate your test
+* Get output in CSV for further data analysis
+* Built-in evaluation tool for quick analysis
+* Use CSV or Google Sheets as input
+
+## Quickstart
+
+Here are the **quickstart guides** for Bulk Runner and Evaluation:
+
+<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td>How to set up Bulk Runner?</td><td><a href="how-to-set-up-bulk-runner.md">how-to-set-up-bulk-runner.md</a></td><td><a href="../../.gitbook/assets/Bulk Runner.png_400x400.png">Bulk Runner.png_400x400.png</a></td></tr><tr><td>How to set up Evaluation?</td><td><a href="how-to-set-up-evaluations.md">how-to-set-up-evaluations.md</a></td><td><a href="../../.gitbook/assets/W.I.9.png_400x400.png">W.I.9.png_400x400.png</a></td></tr><tr><td></td><td></td><td></td></tr></tbody></table>
+
+## How Does Bulk Runner Work? <a href="#r7so22ymyyn2" id="r7so22ymyyn2"></a>
+
+### Bulk Run Overview <a href="#id-2v61ngoeupi4" id="id-2v61ngoeupi4"></a>
+
+This diagram details the process of generating AI-driven answers to a set of test questions using a Language Model (LLM) with Retrieval-Augmented Generation (RAG) capabilities.&#x20;
+
+![](<../../.gitbook/assets/Understanding Bulk run and Evaluations.jpg>)
+
+**1. Test Question Set**
+
+The process begins with a curated set of questions. Examples of such questions include:
+
+\- "What is the lipsync tool's API?"
+
+\- "What is the step-by-step method to make a good animation?"
+
+**2. Bulk Run**
+
+Your “Saved” AI Copilot run processes the entire question set. Each question is individually processed to generate the corresponding answers.
+
+**3. Generated Output Texts**
+
+The generated answers are compiled into an output table. Each question is paired with its respective AI-generated response. For example, the answer to "What is the lipsync tool's API?" provides detailed information regarding the API's functionality and integration methods.
+
+## How Does Evaluation Work? <a href="#id-5anc46np4cur" id="id-5anc46np4cur"></a>
+
+### Comparison and Evaluation Overview <a href="#id-6yuy9sd29g76" id="id-6yuy9sd29g76"></a>
+
+This section details the process for comparing and evaluating generated answers against a set of golden answers to assess their semantic and technical accuracy.
+
+![](<../../.gitbook/assets/Understanding Bulk run and Evaluations (1).jpg>)
+
+1. **Input: Questions and Golden Answers**
+   1.  **Test Question Set**: A curated set of questions to be answered, such as:
+
+       \- "What is the lipsync tool's API?"\
+       \- "What is the step-by-step method to make a good animation?"
+   2. **Golden Answers Set**: Expert-generated answers serve as the benchmark for evaluation.
+2. **Bulk Runs**\
+   The test question set undergoes multiple bulk runs with differently configured Copilot Runs. Eg, you can have various runs where you have tweaked the prompts, or you wish to test out which LLM would answer your questions the best\
+   \
+   The test questions are processed to produce a corresponding set of Generated Answers.
+3.  **Compare and Evaluate**\
+    The generated answer sets from each bulk run are compared against the golden answers to evaluate their accuracy.
+
+    \
+    **Scoring**: Each generated answer set is scored based on its semantic and technical accuracy relative to the golden answers.
+
+In this example:
+
+* **Generated Answer Set 1**: Scores 0.8, indicating it is 80% close in accuracy.
+* **Generated Answer Set 2**: Scores 1.0, indicating perfect alignment with the golden answers.
+* **Generated Answer Set 3**: Scores 0.6, indicating 60% accuracy.
+
+The iterative bulk runs and systematic comparison provide a framework for improving AI-driven answer generation.
+
+## When do you use Bulk vs Bulk+Eval vs Eval? <a href="#z8de5cdl1xxq" id="z8de5cdl1xxq"></a>
+
+* **Bulk workflow only** - If you want to test your Copilot’s functionality for regression tests, monitoring and observability, and bugs.
+* **Bulk and Eval** - If you are testing improvements on your prompts, or updating your documents, want to consider A/B testing.
+* **Eval** **workflow only**- If you already have test data and want to use “LLM as Judge” to evaluate it
+
+### Common terms <a href="#id-3yvzoyislzdo" id="id-3yvzoyislzdo"></a>
+
+* **Golden Answer**: Most suitable and accurate answers provided by humans with expertise on the subject
+* **Semantic Closeness**: Since LLM will not output the same answer every time, the evaluation will check for how semantically close the output of the LLM is to your “Golden Answer”
+* **Score and Rank**: For each generated answer the Evaluation workflow will give a “score” between 0 and 1, and rank the best answer.
+* **Reasoning**: Evaluation LLM will share a short "reasoning" of how the score was given
+* **Chart**: Based on the aggregate score, the Evaluation workflow will create a compare chart that
diff --git a/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md b/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md
@@ -0,0 +1,56 @@
+# 💪 How to set up Bulk Runner?
+
+In this example scenario, we are setting up a simple bulk run to check regression for an AI Copilot in production.
+
+### Step 1: Select Gooey Workflows <a href="#jmvc9vjmbif9" id="jmvc9vjmbif9"></a>
+
+Choose the “SAVED” run from Gooey.AI Workflows that you would like to use.
+
+![](../../.gitbook/assets/2.png)
+
+### Step 2: Input Data Spreadsheet <a href="#s6plmddmwaiq" id="s6plmddmwaiq"></a>
+
+Prepare your test question set:
+
+1. Create a list of the most frequently asked questions for your AI Copilot (we recommend between 25 for optimum observability and regression you can do more if you prefer)
+2. Make sure the Excel sheet/Google Sheets table has a “header” section
+3. Add all your questions in the column below it
+
+<figure><img src="../../.gitbook/assets/3.png" alt="" width="369"><figcaption></figcaption></figure>
+
+1. Paste the link of your Google sheet or upload your data
+
+&#x20;
+
+<figure><img src="../../.gitbook/assets/4.png" alt="" width="563"><figcaption></figcaption></figure>
+
+### Step 3: Select your input columns <a href="#yayrw51txj9z" id="yayrw51txj9z"></a>
+
+In the current scenario, we want to use the Gooey Copilot to answer all the questions in the google sheet. So essentially they are the “input” for the Bulk Workflow.
+
+Select the “questions” column in the “Input Prompt” section.
+
+![](../../.gitbook/assets/5.png)
+
+### Step 4: Hit Submit <a href="#pqej8inj371s" id="pqej8inj371s"></a>
+
+As this is a “Bulk only” scenario, you can “delete” the Copilot Evaluator option in the section. After that hit the “Submit” button.
+
+&#x20;
+
+<figure><img src="../../.gitbook/assets/6.png" alt="" width="563"><figcaption></figcaption></figure>
+
+### Output <a href="#id-6n9vkbjh3n11" id="id-6n9vkbjh3n11"></a>
+
+The workflow will create a new CSV, with an added few columns based on the run, including, “Output Text”, “Run URL”, and “Run Time”.
+
+**Your output will be on the right side of the page.**
+
+<figure><img src="../../.gitbook/assets/7.png" alt="" width="563"><figcaption></figcaption></figure>
+
+#### Additional Note <a href="#kfu0hmigziyi" id="kfu0hmigziyi"></a>
+
+If you want more details in the Output section, use the checkboxes in the “Show All Columns” section on the right. This is useful if you want to keep track of Price, Error Messages, and other details.
+
+![](../../.gitbook/assets/8.png)
+