layout

background-class

body-class

tags

github-link

github-id

featured_image_1

featured_image_2

accelerator

demo-model-link

hub_detail

hub-background

hub

researchers

Silero Voice Activity Detector

Pre-trained Voice Activity Detector

silero_logo.jpg

Silero AI Team

audio

scriptable

https://github.com/snakers4/silero-vad

snakers4/silero-vad

silero_vad_performance.png

no-image

cuda-optional

https://colab.research.google.com/drive/11bhiuFdZ8B2imtEtlHzeU7t-_B59rJxn#scrollTo=udksZuZw0G0i

# this assumes that you have a proper version of PyTorch already installed
pip install -q torchaudio soundfile

import torch
torch.set_num_threads(1)
from pprint import pprint

model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_vad',
                              force_reload=True)

(get_speech_ts,
 _, _, read_audio,
 _, _, _) = utils

files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'

wav = read_audio(f'{files_dir}/en.wav')
speech_timestamps = get_speech_ts(wav, model,
                                  num_steps=4)
pprint(speech_timestamps)

Model Description

Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. Enterprise-grade Speech Products made refreshingly simple (see our STT models). Each model is published separately.

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.

Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but Voice Activity and Number Detection are quite general tasks.

(!!!) Important Notice (!!!) - the models are intended to run on CPU only and were optimized for performance on 1 CPU thread. Note that the model is quantized.

Supported Languages

As of this page update, the following languages are supported:

Russian
English
German
Spanish

Please note that in theory the VAD should also work fine with similar / related languages (e.g. Germanic, Slavic or Romance languages). To see the always up-to-date language list, please visit our repo.

Additional Examples and Benchmarks

For additional examples and other model formats please visit this link and please refer to the extensive examples in the Colab format (including the streaming examples).

References

VAD model architectures are based on similar STT architectures.

Silero VAD
Alexander Veysov, "Toward's an ImageNet Moment for Speech-to-Text", The Gradient, 2020
Alexander Veysov, "A Speech-To-Text Practitioner’s Criticisms of Industry and Academia", The Gradient, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snakers4_silero-vad_vad.md

snakers4_silero-vad_vad.md

Model Description

Supported Languages

Additional Examples and Benchmarks

References

Files

snakers4_silero-vad_vad.md

Latest commit

History

snakers4_silero-vad_vad.md

File metadata and controls

Model Description

Supported Languages

Additional Examples and Benchmarks

References