F5-TTS

F5-TTS is a web application that allows users to clone voices and generate text-to-speech audio using advanced AI models.

Features

Upload and process reference audio
Automatic transcription of reference audio
Text-to-speech generation using F5-TTS or E2-TTS models
Custom prompt input for generated speech
Audio playback and download

Interface

The F5-TTS interface provides an intuitive way to upload reference audio, visualize the waveform, and generate new speech based on the cloned voice.

Technologies Used

Backend: Python, Flask
Frontend: HTML, JavaScript, Tailwind CSS
AI Models: F5-TTS, E2-TTS
Audio Processing: librosa, soundfile, pydub
Transcription: faster-whisper

Audio Clip Size and Performance

The application supports reference audio clips ranging from 1 second to 25 seconds in length. This range is optimized for the best performance of the F5-TTS and E2-TTS models. While users can use longer audio clips, the results may not be as desirable or consistent.

For optimal results, it's recommended to use reference audio within the 1-25 second range. The application includes functionality to process longer audio files, but users should be aware that exceeding the recommended length might impact the quality of the voice cloning and generated speech.

Setup and Installation

Clone the repository: git clone https://github.com/ThisModernDay/f5-tts.git cd f5-tts
Create and activate a new Conda environment with Python 3.10: conda create -n f5-tts python=3.10 conda activate f5-tts
Install the required packages: pip install -r requirements.txt
Set up the environment variables (if necessary).
Run the Flask application: python app.py
Open a web browser and navigate to http://localhost:5000.

Usage

Upload a reference audio file (WAV or MP3, ideally between 1-25 seconds).
The application will automatically transcribe the audio.
Enter your desired prompt text.
Choose between F5-TTS and E2-TTS models.
Click "Generate Audio" to create the cloned voice audio.
Play the generated audio or download it.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
ckpts		ckpts
data		data
images		images
model		model
scripts		scripts
templates		templates
tests/ref_audio		tests/ref_audio
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference-cli.py		inference-cli.py
inference-cli.toml		inference-cli.toml
requirements.txt		requirements.txt
speech_edit.py		speech_edit.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

F5-TTS

Features

Interface

Technologies Used

Audio Clip Size and Performance

Setup and Installation

Usage

Contributing

License

About

Releases

Packages

Languages

License

badgids/f5-tts

Folders and files

Latest commit

History

Repository files navigation

F5-TTS

Features

Interface

Technologies Used

Audio Clip Size and Performance

Setup and Installation

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages