Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parler (generative speech generation) based TTS (proof of conept) #274

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rkusa
Copy link
Collaborator

@rkusa rkusa commented Nov 10, 2024

Yet another TTS experiment similar to #261. This time using Parler-TTS, via candle. This makes this a Rust-dependency, keeping the build and distribution simple. It still needs to download the language model, but this is done automatically on first use (at least for this proof of concept).

This however is not a general purpose TTS, as the model is huge and very slow on the CPU. It only really makes sense with a dedicated GPU so that it can run via CUDA - ie. nothing for most servers. The large model needs about 10GB VRAM. The smaller model requires about 4GB VRAM.

Some additional stats, on a 4090, loading the large model onto the GPU takes about 4s (only done once per mission), and generating the speech of the sample below took 3s.

Since this is a generative speech generation, it doesn't have voices, but uses an additional prompt where you describe the speaker. It still has some named speakers that should help with consistency between different prompts.

Here is a sample.zip using the following speaker prompt:

Jenna speaks very fast in a monotone tone, and high quality audio.

I am still exploring the different offline-TTS options until I'll actually propose to bring one of them into DCS-gRPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant