Skip to content

Latest commit

 

History

History
303 lines (265 loc) · 23.4 KB

Speech.md

File metadata and controls

303 lines (265 loc) · 23.4 KB

Great Deep Learning Tutorials & Resources for Speech Processing

A Great Collection of Deep Learning Tutorials and Repositories for Speech Processing

General (Spoken Language Processing (Speech Processing)):

Text to Speech (TTS):

Automatic Speech Recognition (ASR) & Speech to Text (STT):

AudioLLM:

Speech to Speech Models:

ASR with LLMs:

Speech Language Modeling:

Persian ASR Repos:

Great Resources for Persian ASR Normalization:

Persian based Raw Text Data Sets for LM Training:

Adapters Method instead of fine-tuning for Large-Scale ASR models:

Diffusion based Methods:

Audio Generation:

Speech Translation:

G2P (Grapheme2Phoneme):

Fundamental Notes in Speech Processing & Courses:

Great Kaldi Tutorials:

ASR Error Correction:

Source Separation:

Sound & Audio Classification:

Voice Activity Detection (VAD) & Speech Activity Detection (SAD):

Audio Segmentation:

Extract & Remove Vocals from Song in Audio Files:

Audio Summarization:

Spoken Language Recognition:

Keyword Spotting & Speech Command Recognition:

Active Learning in ASR:

Audio Pretraining, Representation Learning, and Self-Supervised Pretraining:

Audio Augmentation:

Speech Emotion Recognition:

Annotation Tools:

Audio Compression:

Audio Variational Autoencoder (VAE):

Speaker Anonymization:

Some ASR & Speech Datasets:

Voice Conversion:

Interesting Ideas about Startups with ASR:

It is interesting how quickly people implement ideas. Like the one of podcast transcript with Whisper. Here is a selection:

Other: