Skip to content

Commit

Permalink
GITBOOK-180: No subject
Browse files Browse the repository at this point in the history
  • Loading branch information
Ambika Joshi authored and gitbook-bot committed Oct 30, 2024
1 parent 82ecbf3 commit 3034e41
Show file tree
Hide file tree
Showing 7 changed files with 108 additions and 22 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 14 additions & 13 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,30 +19,31 @@
* [Building a Multi-Modal Copilot](guides/copilot/building-a-multi-modal-copilot.md)
* [📢 Broadcast Messages (via web or API)](guides/copilot/broadcast-messages-via-web-or-api.md)
* [Frequently Asked Questions about AI Copilot](guides/copilot/frequently-asked-questions-about-ai-copilot.md)
* [🎞️ How to create AI Animations?](guides/how-to-create-ai-animations.md)
* [🖼️ Create an AI Image with text](guides/create-an-ai-image-with-text/README.md)
* [AI Image Prompting](https://docs.google.com/presentation/d/1RaoMP0l7FnBZovDAR42zVmrUND9W5DW6eWet-pi6kiE/edit#slide=id.p)
* [API Tips for AI Image Generator](guides/create-an-ai-image-with-text/api-tips-for-ai-image-generator.md)
* [👄 How to use AI Lip Sync Generator?](guides/how-to-use-ai-lip-sync-generator/README.md)
* [Lip Sync Animation Generator (WITH AUDIO FILES)](guides/how-to-use-ai-lip-sync-generator/lip-sync-animation-generator-with-audio-files.md)
* [LipSync videos with Custom Voices](guides/how-to-use-ai-lip-sync-generator/lipsync-videos-with-custom-voices.md)
* [Set up your API for Lipsync with Local Folders](guides/how-to-use-ai-lip-sync-generator/set-up-your-api-for-lipsync-with-local-folders.md)
* [Tips to create great HD lipsync output](guides/how-to-use-ai-lip-sync-generator/tips-to-create-great-hd-lipsync-output.md)
* [🤳 How to make amazing AI Art QR Codes?](guides/how-to-make-amazing-ai-art-qr-codes/README.md)
* [API tips on AI Art QR Codes](guides/how-to-make-amazing-ai-art-qr-codes/api-tips-on-ai-art-qr-codes.md)
* [📸 AI Photo Editor](guides/ai-photo-editor/README.md)
* [Build your avatar with AI](guides/ai-photo-editor/build-your-avatar-with-ai.md)
* [🌐 How to create SEO-Optimized content with AI?](guides/how-to-create-seo-optimized-content-with-ai.md)
* [🔍 Generate “People Also Ask” SEO Content](guides/generate-people-also-ask-seo-content.md)
* [📊 How to create language evaluation for ASR?](guides/how-to-create-language-evaluation-for-asr.md)
* [🗣️ How to use ASR?](guides/how-to-use-asr/README.md)
* [📊 How to create language evaluation for ASR?](guides/how-to-use-asr/how-to-create-language-evaluation-for-asr.md)
* [How to use Compare AI Translations?](guides/how-to-use-compare-ai-translations/README.md)
* [Google Translate Glossary](guides/how-to-use-compare-ai-translations/google-translate-glossary.md)
* [How does RAG-based document search work?](guides/how-does-rag-based-document-search-work.md)
* [🧩 How to use Gooey Functions?](guides/how-to-use-gooey-functions/README.md)
* [✨ LLM-enabled Functions](guides/how-to-use-gooey-functions/llm-enabled-functions.md)
* [How to use Compare AI Translations?](guides/how-to-use-compare-ai-translations/README.md)
* [Google Translate Glossary](guides/how-to-use-compare-ai-translations/google-translate-glossary.md)
* [⚖️ Understanding Bulk Runner and Evaluation](guides/understanding-bulk-runner-and-evaluation/README.md)
* [💪 How to set up Bulk Runner?](guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md)
* [🕵️‍♀️ How to set up Evaluations?](guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md)
* [🎞️ How to create AI Animations?](guides/how-to-create-ai-animations.md)
* [🤳 How to make amazing AI Art QR Codes?](guides/how-to-make-amazing-ai-art-qr-codes/README.md)
* [API tips on AI Art QR Codes](guides/how-to-make-amazing-ai-art-qr-codes/api-tips-on-ai-art-qr-codes.md)
* [🖼️ Create an AI Image with text](guides/create-an-ai-image-with-text/README.md)
* [AI Image Prompting](https://docs.google.com/presentation/d/1RaoMP0l7FnBZovDAR42zVmrUND9W5DW6eWet-pi6kiE/edit#slide=id.p)
* [API Tips for AI Image Generator](guides/create-an-ai-image-with-text/api-tips-for-ai-image-generator.md)
* [📸 AI Photo Editor](guides/ai-photo-editor/README.md)
* [Build your avatar with AI](guides/ai-photo-editor/build-your-avatar-with-ai.md)
* [🔍 Generate “People Also Ask” SEO Content](guides/generate-people-also-ask-seo-content.md)
* [🌐 How to create SEO-Optimized content with AI?](guides/how-to-create-seo-optimized-content-with-ai.md)

## 😇 CONTRIBUTING

Expand Down
85 changes: 85 additions & 0 deletions guides/how-to-use-asr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
description: A simple guide on how to use Automatic Speech Recognition
---

# 🗣️ How to use ASR?

We find speech recognition and translations important when we create AI workflows for frontline workers, traders, and impact organizations. 

In AI copilot scenarios, we found that users prefer to send queries by voice rather than text. This could mean: 

* voice notes to Whatsapp and web copilots
* Voice-based IVR in low internet coverage areas

So as part of our [AI Workflow Standards](https://blog.gooey.ai/workflow-standards), we now host over 15 ASR models, that can be chained into the AI Copilot workflows. 

Here is a simple guide to use the ASR Workflow

### Step 1: Add your audio samples

* Head to the Gooey.AI Speech Recognition and Translation Workflow: [https://gooey.ai/speech/](https://gooey.ai/speech/)
* Click on the "Clear all" Button and upload your audio file via a local folder or a link 

{% hint style="info" %}
If you are using a link you can use:  

* a hosted media link 
* a google drive link of the audio file 
* youtube video link
{% endhint %}



<div>

<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.27.20 PM.png" alt="" width="375"><figcaption></figcaption></figure>



<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.28.00 PM.png" alt="" width="375"><figcaption></figcaption></figure>

</div>

### Step 2: Select the Speech to Text provider&#x20;

* Select the language in "Speech-to-Text Provider" from the dropdown provided.

{% hint style="info" %}
Use the "Filter by Language" dropdown, if you are unsure which models will work with your source language
{% endhint %}

<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.35.22 PM.png" alt=""><figcaption></figcaption></figure>

### Step 3: Select your translation model

* Click on the "Translate" checkbox
* Select the translation model of your choice

### Step 4: Hit "Run"

* Click on the "Run" button&#x20;

### FAQS

Q: How do I test the ASR models that can transcribe Swahili?

A: Use the "Filter by Language" dropdown, if you are unsure which models will work with your source language.

Q: In the translate section I can see "Google Translate" and "GhanaNLP", which model should I use?

A: If you are translating an African Language you can test if GhanaNLP is a better choice. GhanaNLP Machine Translation supports: Twi, Ewe, Ga, Fanti, Yoruba, Dagbani, Kikuyu, Fra fra, Luo (Kenya, Tanzania), Meru, Kusaal

Q: I have tested a few models, but I want to evaluate a larger dataset without using the API, is that possible?

A: Yes! It is very easy to set up large-scale ASR evaluations in Gooey! Here is the guide for:&#x20;

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>How to create language evaluation for ASR?</strong></td><td><a href="how-to-create-language-evaluation-for-asr.md">how-to-create-language-evaluation-for-asr.md</a></td></tr></tbody></table>



Q: What is the "Prompt" section when I choose GPT4o-Audio?

A: GPT4o-Audio is an LLM based transcription model, the prompt section will allow you to output the transcribed audio in more specific ways. For example, if you input a Hindi audio sample, you can say "Translate the Hindi recording as accurately as possible". This will use the LLM directly to make the translation. You could also use it in other innovative ways like "Summarize the Hindi recording to English in bullet points" which could give you the salient points of the recording directly. Like the example here:&#x20;

{% embed url="https://gooey.ai/speech/?run_id=1rnoo9r71o3q&uid=fm165fOmucZlpa5YHupPBdcvDR02" %}

Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ There are several components to test:&#x20;

#### Also see:

<table data-view="cards" data-full-width="true"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td>🏎️ Global Language Understanding for AIs</td><td><a href="https://app.gitbook.com/s/leYcqBx5FRZcVr3wI4f4/global-language-understanding-for-ais">Global Language Understanding for AIs</a></td><td><a href="../.gitbook/assets/gooey.ai - cute robots racing vintage painted magazine advertisement muted colorful illustration.png">gooey.ai - cute robots racing vintage painted magazine advertisement muted colorful illustration.png</a></td></tr><tr><td>🗣️ Check out our Hindi ASR Evaluation</td><td><a href="https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/">https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/</a></td><td><a href="../.gitbook/assets/gooey.ai - cute vintage poster style illustration of a small group of indian kids learning hindi (1).png">gooey.ai - cute vintage poster style illustration of a small group of indian kids learning hindi (1).png</a></td></tr></tbody></table>
<table data-view="cards" data-full-width="true"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td>🏎️ Global Language Understanding for AIs</td><td><a href="https://app.gitbook.com/s/leYcqBx5FRZcVr3wI4f4/global-language-understanding-for-ais">Global Language Understanding for AIs</a></td><td><a href="../../.gitbook/assets/gooey.ai - cute robots racing vintage painted magazine advertisement muted colorful illustration.png">gooey.ai - cute robots racing vintage painted magazine advertisement muted colorful illustration.png</a></td></tr><tr><td>🗣️ Check out our Hindi ASR Evaluation</td><td><a href="https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/">https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/</a></td><td><a href="../../.gitbook/assets/gooey.ai - cute vintage poster style illustration of a small group of indian kids learning hindi (1).png">gooey.ai - cute vintage poster style illustration of a small group of indian kids learning hindi (1).png</a></td></tr></tbody></table>

## Getting Started

Expand All @@ -39,7 +39,7 @@ There are several components to test:&#x20;
3. Add the human-created transcription for each sample
4. Add the English translation for each sample

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.27.38 AM.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.27.38 AM.png" alt=""><figcaption></figcaption></figure>

### Step 1 - Select the ASR Models

Expand All @@ -49,37 +49,37 @@ Head to our bulk and eval workflow.&#x20;

In the example, we have already pre-filled the various models that can be tested. You can choose the ones you want to run by selecting it in the dropdown.

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.31.01 AM.png" alt=""><figcaption><p>Select the models you want to evaluate your audio samples on</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.31.01 AM.png" alt=""><figcaption><p>Select the models you want to evaluate your audio samples on</p></figcaption></figure>

### Step 2 - Add your CSV/Google Sheets

Upload your CSV/Google Sheet from [Step 0](how-to-create-language-evaluation-for-asr.md#step-0-prepare-your-data). In this example, we have used a Google Sheet of 10 Audio Samples with transcripts and translations. A preview of your sheet will appear once it is correctly uploaded.&#x20;

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.34.15 AM.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.34.15 AM.png" alt=""><figcaption></figcaption></figure>

### Step 3 - Select the input column

Select the column in the input from the dropdown box. The outputs will appear as various columns. In this example, it will be the "audios" column

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.36.40 AM.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.36.40 AM.png" alt=""><figcaption></figcaption></figure>

### Step 4 - Select the pre-built evaluator

In the "Evaluation Workflows" section select the "Speech Recognition Model Evaluator".

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.38.13 AM.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.38.13 AM.png" alt=""><figcaption></figcaption></figure>

### Step 5 - Hit Submit

Once you hit submit, the selected ASR model workflows (see [Step 1](how-to-create-language-evaluation-for-asr.md#step-1-select-the-asr-models)) will run for each audio file in the sheet (see [Step 2](how-to-create-language-evaluation-for-asr.md#step-2-add-your-csv-google-sheets)). An output CSV will be generated on the right-hand side of the page.&#x20;

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.47.58 AM.png" alt=""><figcaption><p>The outputs will appear in a table format on the right of the page, the output will appear in new columns after your originally populated columns</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.47.58 AM.png" alt=""><figcaption><p>The outputs will appear in a table format on the right of the page, the output will appear in new columns after your originally populated columns</p></figcaption></figure>

After the runs are complete, the selected Evaluator (see [Step 4](how-to-create-language-evaluation-for-asr.md#step-4-select-the-pre-built-evaluator)), will compare the ASR model outputs to the human-generated translations. It will assess and rate how accurately each model has translated the audio sample.&#x20;

A bar graph with the performance will appear once the entire evaluation is complete.

<figure><img src="../.gitbook/assets/Screenshot 2024-05-07 at 12.50.05 AM.png" alt=""><figcaption><p>After the evaluation is complete, table and a bar graph will show the evalution scores</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-07 at 12.50.05 AM.png" alt=""><figcaption><p>After the evaluation is complete, table and a bar graph will show the evalution scores</p></figcaption></figure>



Expand All @@ -89,7 +89,7 @@ A bar graph with the performance will appear once the entire evaluation is compl

A: Arrange the audio sample link in the first column, for each audio link add transcriptions and translations in the respective row.&#x20;

<figure><img src="../.gitbook/assets/Screenshot 2024-05-08 at 11.08.26 PM.png" alt=""><figcaption><p>Screenshot of audio sample links with trasncriptions and english translation</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Screenshot 2024-05-08 at 11.08.26 PM.png" alt=""><figcaption><p>Screenshot of audio sample links with trasncriptions and english translation</p></figcaption></figure>

#### Q: What is the ideal length of the recording?

Expand Down

0 comments on commit 3034e41

Please sign in to comment.