Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.
/ vits-webui Public archive
forked from Artrajz/vits-simple-api

A simple Webui for VITS Inference with API support, built on MoeGoe

License

Notifications You must be signed in to change notification settings

HaomingXR/vits-webui

 
 

Repository files navigation

VITS Webui

[English|中文]

A simple Webui that allows you to inference VITS TTS models.
Also comes with API supports to interact with other processes.

Features

  • VITS Text-to-Speech
  • GPU Acceleration
  • Support for Multiple Models
  • Automatic Language Recognition & Processing
  • Customize Parameters
  • Batch Processing for Long Text

Main Differences of the Fork

  • No longer hardcode the paths in Config. The project is now portable.
  • Automatically load the model paths. No more manually editing the entry in Config.
  • Prioritize PyTorch w/ Nvidia GPU Support. (Built on CUDA 11.8)

    Edit requirements.txt if using other CUDA versions

  • Should not throw issues when installing fasttext, at least on Windows
  • Clean up a few entries of the Config.
  • Removed all Docker related stuffs...
  • By default, only supports VITS models. You will need to edit the config.py and some other scripts to use VITS2, etc.

Some original features might be missing...

Deployment

1. Clone the Project

Open the console at the target location, then run the following:

git clone https://github.com/HaomingXR/vits-webui

2. Prepare Python

Local Installation

  • Create a virtual environment using the Python installed on your system (Tested on 3.10.10)
python -m venv venv
venv\scripts\activate

Portable Installation

  • Download the self-contained Python runtime, Windows Embeddable Package
  • Open the python3<version>._pth file (with a text editor)
  • Uncomment the import site line
  • Then, download and run get-pip.py to install pip

3. Install Python Dependencies

Edit requirements.txt if using other CUDA versions, or not using Nvidia GPU

pip install -r requirements.txt

4. Start

Run the following command to start the service:

python app.py

On Windows, you can also run webui.bat to directly launch the service.

Edit the file and point to the Python runtime

Model Loading

1. Download VITS Models

  • You may find various VITS models online, usually on HuggingFace spaces
  • Download the VITS model files (including both .pth and .json files)

2. Loading Models

  • Place both the model and config into their own folder, then place the folder inside the models directory
  • On launch, the system should automatically detect the models

Configs

The file config.py contains a few default options. After launching the service for the first time, it will generate a config.yaml in the directory. All future launches will load this config instead.

Admin Backend

The Admin Backend allows loading and unloading models, with login authentication. For added security, you can just disable the backend in the config.yaml:

'IS_ADMIN_ENABLED': !!bool 'false'

When enabled, it will automatically generate a pair of username and password in config.yaml

API Key

You can enable this setting, so that the API usages require a key to connect.

'API_KEY_ENABLED': !!bool 'false'

When enabled, it will automatically generate a random key in config.yaml

Server Port

You can edit this setting to set the local server port for the API.

'PORT': !!int '8888'

APIs

  • Return the dictionary mapping of IDs to Speaker
GET http://127.0.0.1:8888/voice/speakers
  • Return the audio data speaking prompt

default parameters are used when not specified

GET http://127.0.0.1:8888/voice/vits?text=prompt

Parameter

VITS

Parameter Required Default Value Type Instruction
text true str Text to speak
id false From config.yaml int Speaker ID
format false From config.yaml str wav / ogg / mp3 / flac
lang false From config.yaml str The language of the text to be synthesized
length false From config.yaml float The length of the synthesized speech. The larger the value, the slower the speed.
noise false From config.yaml float The randomness of the synthesis
noisew false From config.yaml float The length of phoneme pronunciation
segment_size false From config.yaml int Divide the text into paragraphs based on punctuation marks
streaming false false bool Stream synthesized speech with faster initial response

Check the original repo for more info

Resources

About

A simple Webui for VITS Inference with API support, built on MoeGoe

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Python 86.2%
  • HTML 7.5%
  • JavaScript 4.6%
  • CSS 1.7%