Whisper transcribe for vods

CUDA is highly recommended for this! CPU is about 3x slower. Read more about speed comparison here: https://github.com/guillaumekln/faster-whisper#benchmark

Installation

Create a python virtual environment

python -m venv venv

source venv/bin/activate # linux, or...
venv/Scripts/activate # for windows

Install pip packages

pip install -r requirements.txt

Head over to pytorch.org, select:

PyTorch Build	Stable
Your OS	xxxx
Package	Pip
Language	Python
Compute Platform	CUDA <latest version>

Then run the given command to install pytorch.

Copy SAMPLE_config.json to config.json and change the api endpoints.
Make sure to have ffmpeg installed.

Run

Just running the main script will mostly do all you need. Transcribed scripts will be saved in transcripts/.

python main.py -e prod transcribe

Using the large whisper model (default) will result in the best speech to text and requires ~6GB GPU memory. Use python main.py transcribe -h so see all available models.

usage: Wubbl0rz Archiv Transcribe [-h] [-c CONFIG] -e {prod,dev} [-o OUTPUT]
                                  {transcribe,post} ...

positional arguments:
  {transcribe,post}     Available commands
    transcribe          Run whisper to transcribe vods to text
    post                Post available transcriptions

options:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Path to config.json
  -e {prod,dev}, --environment {prod,dev}
                        Target environment
  -o OUTPUT, --output OUTPUT
                        Output directory for transcripts

Name		Name	Last commit message	Last commit date
Latest commit History 816 Commits
.vscode		.vscode
lib		lib
transcripts		transcripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SAMPLE_config.json		SAMPLE_config.json
delete_dups.py		delete_dups.py
main.py		main.py
requirements-pytorch.txt		requirements-pytorch.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper transcribe for vods

Installation

Run

About

Releases

Packages

Contributors 2

Languages

License

seriousm4x/wubbl0rz-archiv-transcribe

Folders and files

Latest commit

History

Repository files navigation

Whisper transcribe for vods

Installation

Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages