Add Parler (generative speech generation) based TTS (proof of conept) #274

rkusa · 2024-11-10T20:18:45Z

Yet another TTS experiment similar to #261. This time using Parler-TTS, via candle. This makes this a Rust-dependency, keeping the build and distribution simple. It still needs to download the language model, but this is done automatically on first use (at least for this proof of concept).

This however is not a general purpose TTS, as the model is huge and very slow on the CPU. It only really makes sense with a dedicated GPU so that it can run via CUDA - ie. nothing for most servers. The large model needs about 10GB VRAM. The smaller model requires about 4GB VRAM.

Some additional stats, on a 4090, loading the large model onto the GPU takes about 4s (only done once per mission), and generating the speech of the sample below took 3s.

Since this is a generative speech generation, it doesn't have voices, but uses an additional prompt where you describe the speaker. It still has some named speakers that should help with consistency between different prompts.

Here is a sample.zip using the following speaker prompt:

Jenna speaks very fast in a monotone tone, and high quality audio.

I am still exploring the different offline-TTS options until I'll actually propose to bring one of them into DCS-gRPC.

Add Parler based TTS (POC)

1845eea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parler (generative speech generation) based TTS (proof of conept) #274

Add Parler (generative speech generation) based TTS (proof of conept) #274

rkusa commented Nov 10, 2024

Add Parler (generative speech generation) based TTS (proof of conept) #274

Are you sure you want to change the base?

Add Parler (generative speech generation) based TTS (proof of conept) #274

Conversation

rkusa commented Nov 10, 2024