Skip to content

Latest commit

 

History

History
120 lines (80 loc) · 3.33 KB

TTS_DOCUMENTATION.md

File metadata and controls

120 lines (80 loc) · 3.33 KB

Text to Speech

📚 Quickstart

Data collect

  • HifiTTS: high-resolution multi-speaker english dataset used here as baseline. Can be downloaded here.

Data preprocessing

  1. Generate phonetic alignment using GlowTTS:

    a) Download GlowTTS model checkpoint.

    b) Update GLOW_TTS_CKPT_PATH in compute_glowtts_alignments.py script.

    c) Prepare a GlowTTS filelist or use this example for HiFiTTS dataset (you need to download the dataset first).

    d) Prepare a GlowTTS config, changing:

    - `"training_files"` to your filelist,
    - `"cmudict_path"` to `<nansypp_path>/static/tts/cmu_dictionary`.
    

    e) Run the alignment script:

    python src/data/preprocessing/compute_glowtts_alignments.py <config_file> <input_dir> <output_dir>
  2. Decode audio using:

python src/data/preprocessing/decode.py -i <input_dir> -o <output_dir> -sr 44100
  1. Compute TTS targets using:
python -m src.data.preprocessing.precompute_tts_targets \
    <decoded_output_dir>/dataset.csv \
    <sample_rate> \
    <tts_targets_dir> \
    <backbone_exp_dir> \
    <backbone_ckpt_name>
  1. Train/test split:
head -n 1001 <tts_targets_dir>/dataset.csv > <tts_targets_dir>/validation_dataset.csv
head -n 1 <tts_targets_dir>/dataset.csv > <tts_targets_dir>/train_dataset.csv
sed -n '1002,$p' tts_targets_dir>/dataset.csv  >> <tts_targets_dir>/train_dataset.csv

Training

  1. Edit TTS training config: specify <tts_targets_dir> and <alignment_dir>.

  2. Run the training script:

python src/train/tts.py --config-name=hifitts +trainer.devices=<list_of_gpu_ids>

Checkpoint

Run download_backbone_ckpt.py that will download a checkpoint we trained using this repository for 200k training-steps and will place it in the right directory so that following inference and app work smoothly.

python src/utilities/download_checkpoints.py

Inference

An inferencer class is provided in source code and can be called from command-line as follows:

python src/inference/tts.py \
<experiment_directory> \
<checkpoint_filename> \
<audio_path> \
<text> \
<output_path> \
-d <device>

Example:

python src/inference/tts.py \
"static/runs/runs_tts/hifitts/2023-10-03_18-23-00" \
"steps=step=15000.ckpt" \
"static/samples/vctk/p238_001.wav" \
"To be or not to be that is the question" \
"static/tmp/to_be.wav"

Streamlit app

streamlit run app/text_to_speech.py --server.port <port_number>

Logs

Along training you can visualize logs using the following command:

tensorboard --logdir=static/runs/runs_tts --bind_all --port <port_number>

🔬 R&D

Observations and key R&D results are detailed here.

🎧 Results

Results from checkpoints trained with this repo are showcased on this Notion page.