YTTTS

The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions

videos.txt is a text file that consists of concatenated YouTube video IDs. YouTube video URLs are in the format https://www.youtube.com/watch?v=<video-id>, for example:

https://www.youtube.com/watch?v=BRRolKTlF6Q

A YouTube video ID is always 11 characters in length, so to read in video IDs from the example file provided, you simply have to read the contents in 11 byte chunks:

with open('videos.txt', 'r') as f:
  while True:
    ID = f.read(11)
    print (ID)

scrape.py scrapes YouTube video IDs and continuously appends them to the file videos.txt.
Once you are satisfied with the quanitity that has been scraped (or you may simply use the preprovided list of video IDs), running main.py will iterate through the scraped videos and download both the audio and captions from each video. It will then extract the videos subtitles and their corresponding audio clips, which are parsed from a .srt file, and organize a tree of subdirectories within each video's data folder. Each subdirectory contains both a text file containing the phrase uttered in the short audio clip (subtitles.txt), and the corresponding audio in waveform (audio.wav).

You can also try it out with the included file LastWeekTonight.txt, which contains the contatenated video IDs of every video posted on John Oliver's Last Week Tonight's YouTube Channel as of March 22, 2021.

Some Demos via Google Drive

Uses

Voice Cloning
TTS Engines
Speaker Embedding
Speaker Recognition

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
samples		samples
LICENSE		LICENSE
LastWeekTonight.txt		LastWeekTonight.txt
README.md		README.md
main.py		main.py
scrape.py		scrape.py
videos.txt		videos.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YTTTS

Some Demos via Google Drive

Uses

Download

About

Releases

Packages

Languages

License

ryanrudes/YTTTS

Folders and files

Latest commit

History

Repository files navigation

YTTTS

Some Demos via Google Drive

Uses

Download

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages