Skip to content

AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! ๐ŸŽ™๏ธ๐Ÿค–๐ŸŽง

License

Notifications You must be signed in to change notification settings

Yuan-ManX/ai-voice-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AI Voice Agents

AI-Voice-Agents

AI Voice Agents - Exploring the Next Generation of Human-Machine Interaction! ๐ŸŽ™๏ธ๐Ÿค–๐ŸŽง

Table of Contents

Project List

Full Stack

Source Description Code Paper Model
GPT-4o GPT-4o (โ€œoโ€ for โ€œomniโ€) is a step towards much more natural human-computer interactionโ€”it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. API
Retell AI Retell AI -Build Advanced Voice AI, Powered by LLM. API

^ Back to Contents ^

Text To Speech

Source Description Code Paper Model
ChatTTS ChatTTS is a text-to-speech model designed specifically for dialogue scenario such as LLM assistant. GitHub Hugging Face
CosyVoice Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. GitHub
ElevenLabs ElevenLabs: Text to Speech & AI Voice Generator. API
Matcha-TTS Matcha-TTS: A fast TTS architecture with conditional flow matching. GitHub arXiv
StyleTTS 2 Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. GitHub arXiv
XTTS ๐ŸธTTS is a library for advanced Text-to-Speech generation. GitHub

^ Back to Contents ^

Automatic Speech Recognition

Source Description Code Paper Model
SenseVoice SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). GitHub Hugging Face
TeleSpeech-ASR Large speech model-super multi-dialect ASR. GitHub Hugging Face
Whisper Whisper is a general-purpose speech recognition model. GitHub arXiv Hugging Face

^ Back to Contents ^

Audio Generation

Source Description Code Paper Model
Make-An-Audio 3 Transforming Text into Audio via Flow-based Large Diffusion Transformers. GitHub arXiv Hugging Face

^ Back to Contents ^

About

AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! ๐ŸŽ™๏ธ๐Ÿค–๐ŸŽง

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published