-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
82ecbf3
commit 3034e41
Showing
7 changed files
with
108 additions
and
22 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
description: A simple guide on how to use Automatic Speech Recognition | ||
--- | ||
|
||
# 🗣️ How to use ASR? | ||
|
||
We find speech recognition and translations important when we create AI workflows for frontline workers, traders, and impact organizations.  | ||
|
||
In AI copilot scenarios, we found that users prefer to send queries by voice rather than text. This could mean:  | ||
|
||
* voice notes to Whatsapp and web copilots | ||
* Voice-based IVR in low internet coverage areas | ||
|
||
So as part of our [AI Workflow Standards](https://blog.gooey.ai/workflow-standards), we now host over 15 ASR models, that can be chained into the AI Copilot workflows.  | ||
|
||
Here is a simple guide to use the ASR Workflow | ||
|
||
### Step 1: Add your audio samples | ||
|
||
* Head to the Gooey.AI Speech Recognition and Translation Workflow: [https://gooey.ai/speech/](https://gooey.ai/speech/) | ||
* Click on the "Clear all" Button and upload your audio file via a local folder or a link  | ||
|
||
{% hint style="info" %} | ||
If you are using a link you can use:   | ||
|
||
* a hosted media link  | ||
* a google drive link of the audio file  | ||
* youtube video link | ||
{% endhint %} | ||
|
||
|
||
|
||
<div> | ||
|
||
<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.27.20 PM.png" alt="" width="375"><figcaption></figcaption></figure> | ||
|
||
|
||
|
||
<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.28.00 PM.png" alt="" width="375"><figcaption></figcaption></figure> | ||
|
||
</div> | ||
|
||
### Step 2: Select the Speech to Text provider  | ||
|
||
* Select the language in "Speech-to-Text Provider" from the dropdown provided. | ||
|
||
{% hint style="info" %} | ||
Use the "Filter by Language" dropdown, if you are unsure which models will work with your source language | ||
{% endhint %} | ||
|
||
<figure><img src="../../.gitbook/assets/Screenshot 2024-10-30 at 1.35.22 PM.png" alt=""><figcaption></figcaption></figure> | ||
|
||
### Step 3: Select your translation model | ||
|
||
* Click on the "Translate" checkbox | ||
* Select the translation model of your choice | ||
|
||
### Step 4: Hit "Run" | ||
|
||
* Click on the "Run" button  | ||
|
||
### FAQS | ||
|
||
Q: How do I test the ASR models that can transcribe Swahili? | ||
|
||
A: Use the "Filter by Language" dropdown, if you are unsure which models will work with your source language. | ||
|
||
Q: In the translate section I can see "Google Translate" and "GhanaNLP", which model should I use? | ||
|
||
A: If you are translating an African Language you can test if GhanaNLP is a better choice. GhanaNLP Machine Translation supports: Twi, Ewe, Ga, Fanti, Yoruba, Dagbani, Kikuyu, Fra fra, Luo (Kenya, Tanzania), Meru, Kusaal | ||
|
||
Q: I have tested a few models, but I want to evaluate a larger dataset without using the API, is that possible? | ||
|
||
A: Yes! It is very easy to set up large-scale ASR evaluations in Gooey! Here is the guide for:  | ||
|
||
<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>How to create language evaluation for ASR?</strong></td><td><a href="how-to-create-language-evaluation-for-asr.md">how-to-create-language-evaluation-for-asr.md</a></td></tr></tbody></table> | ||
|
||
|
||
|
||
Q: What is the "Prompt" section when I choose GPT4o-Audio? | ||
|
||
A: GPT4o-Audio is an LLM based transcription model, the prompt section will allow you to output the transcribed audio in more specific ways. For example, if you input a Hindi audio sample, you can say "Translate the Hindi recording as accurately as possible". This will use the LLM directly to make the translation. You could also use it in other innovative ways like "Summarize the Hindi recording to English in bullet points" which could give you the salient points of the recording directly. Like the example here:  | ||
|
||
{% embed url="https://gooey.ai/speech/?run_id=1rnoo9r71o3q&uid=fm165fOmucZlpa5YHupPBdcvDR02" %} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters