wav2train

Automatic pipeline to prepare a directory full of (audio clip : transcript) file pairs for wav2letter training. Currently uses DSAlign for transcript alignment.

This project is part of Talon Research. If you find this useful, please donate.

Installation

This process works best on a Mac or Linux computer.

Debian

sudo apt install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev \
                 python3 python3-pip python3-venv ffmpeg wget sox
./setup

macOS

brew install python3 ffmpeg wget cmake boost sox
./setup

Usage

./wav2train input/ output/
# ./wfilter output/clips.lst > output/clips-filt.lst # not yet implemented
./wsplit  output/clips.lst

Description

Consumes a directory with audio and matching transcripts, such as:
```
input/a.wav input/a.txt
input/b.wav input/b.txt
```
Most common audio formats (wav, flac, mp3, ogg, sph, etc) will be detected. You can mix formats in the input directory. The audio files can be any length. The only requirement is that the text file is a transcription of the audio file.
Finds voice activity in the audio files and time-aligns these segments to the transcription.
Extracts the voice segments into .flac files and creates a wav2letter-compatible clips.lst file.

The output at this point looks like:

output/clips/a.flac
output/clips/b.flac
output/clips.lst

[Optional] Use the wfilter tool to filter out "bad inputs" using a pretrained model and an error threshold.
```
./wfilter --help
```
[Optional] Use the wsplit tool to auto-split a clips.lst file into dev.lst,test.lst,train.lst.
```
./wsplit output/clips.lst
# or, if you filtered:
./wsplit output/filter.lst
```
[Optional] Use the wpiece tool to generate word piece tokens + lexicon. (The wlexicon tool can do the same thing for character lexicons.)
```
# generates example.lexicon, example.tokens
./wpiece example --list output/clips.lst
```

Extras

# Print the transcript for each clip and play it, for debugging
./wplay output/clips.lst

# Update the paths in output/*.lst to match its current directory
# As *.lst uses absolute paths, this is useful to run after moving
#    datasets around on your disk or to a new machine.
# Only works if clips are in the dirname(.lst)/clips/* directory
./wrebase output/

# Print some basic stats about a dataset, such as number of clips and total hours.
./wstat output/clips.lst

# Generate word piece vocab and lexicon from one or more lst files.
./wpiece name --list output/clips.lst

# Generate character lexicon from one or more lst files.
./wlexicon name output/clips.lst

# Filter a list dataset by many criteria
./wfilter --help

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
DSAlign @ 3fdefeb		DSAlign @ 3fdefeb
misc		misc
src		src
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup		setup
wav2train		wav2train
wbatch		wbatch
wfilter		wfilter
wlexicon		wlexicon
wpiece		wpiece
wplay		wplay
wrebase		wrebase
wsplit		wsplit
wstat		wstat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wav2train

Installation

Debian

macOS

Usage

Description

Extras

About

Releases

Packages

Languages

License

talonvoice/wav2train

Folders and files

Latest commit

History

Repository files navigation

wav2train

Installation

Debian

macOS

Usage

Description

Extras

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages