CUDA is highly recommended for this! CPU is about 3x slower. Read more about speed comparison here: https://github.com/guillaumekln/faster-whisper#benchmark
- Create a python virtual environment
python -m venv venv
source venv/bin/activate # linux, or...
venv/Scripts/activate # for windows
- Install pip packages
pip install -r requirements.txt
Head over to pytorch.org, select:
PyTorch Build | Stable |
Your OS | xxxx |
Package | Pip |
Language | Python |
Compute Platform | CUDA <latest version> |
Then run the given command to install pytorch.
-
Copy
SAMPLE_config.json
toconfig.json
and change the api endpoints. -
Make sure to have
ffmpeg
installed.
Just running the main script will mostly do all you need. Transcribed scripts will be saved in transcripts/
.
python main.py -e prod transcribe
Using the large whisper model (default) will result in the best speech to text and requires ~6GB GPU memory. Use python main.py transcribe -h
so see all available models.
usage: Wubbl0rz Archiv Transcribe [-h] [-c CONFIG] -e {prod,dev} [-o OUTPUT]
{transcribe,post} ...
positional arguments:
{transcribe,post} Available commands
transcribe Run whisper to transcribe vods to text
post Post available transcriptions
options:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path to config.json
-e {prod,dev}, --environment {prod,dev}
Target environment
-o OUTPUT, --output OUTPUT
Output directory for transcripts