[中文|English]
This is a project that implements ASR (Automatic Speech Recognition) inside Unity via the Vosk libraries. This project was built and improved upon the Sample linked on the official website.
Vosk is a speech recognition toolkit. Some advantages of Vosk include:
- Supports 20+ languages and dialects
- Works offline, even on lightweight devices
- Model comes in
~50 MB
size for Mobile &~2 GB
size for Server - Allows manual configuration of keywords for better accuracy
- Download a language model from the Official List
- Extract the model to the
Application.streamingAssetsPath
of the project - Larger model is more accurate but takes longer to load
- (Optionally) Install Newtonsoft Json Unity Package
- Inside the script that uses the ASR, add
using Vosk.APIs;
- Call
VoskASR.Init
with the necessary arguments - Subscribe to
VoskASR.OnTranscriptionResult
to process the results - The results are sent in a Json format. Hence why
Newtonsoft.Json
is recommended. - Refer to
Demo.cs
for examples - Use
LoudnessMeter
to visualize the input volume - Use
ChineseUtil
to convert between Simplified Chinese and Traditional Chinese- Chinese models are often trained on Simplified Chinese
caller
: Pass in theMonoBehaviour
for Unity to callStartCoroutine
onmodelName
: Pass in the folder name of the modelautoStart
: Should the ASR start afterInit
maxAlternatives
: How many variants of results does the ASR generatemicrophoneIndex
: The index of Microphone to usekeyPhrases
: Manual configuration of keywords to detect