The repository releases the speech embeddings learned by Speech2Vec proposed by Chung and Glass (2018). Feel free to contact me for any questions.
Speech2Vec is a recently proposed deep neural network architecture capable of representing variable-length speech segments as real-valued, fixed-dimensional speech embeddings that capture the semantics of the segments---It can be viewed as a speech version of Word2Vec! The training of Speech2Vec borrows the metholodogy of skip-grams & CBOW from Word2Vec and is thus unsupervised, i.e., we do not need to know the word identity of a speech segment. Please refer to the original paper for more details.
In this repository, we release the speech embeddings of different dimensionalities learned by Speech2Vec using skip-grams as the training methodology. The model is trained on a corpus consisting of about 500 hours of speech from LibriSpeech (the clean-360 + clean-100 subsets). We also include the word embeddings learned by skip-grams Word2Vec trained on the transcript of the same speech corpus.
Dim | Speech2Vec | Word2Vec |
---|---|---|
50 | file | file |
100 | file | file |
200 | file | file |
300 | file | file |
The following figure shows the relationship between the dimensionality of the speech/word embeddings and the performance (higher the better) on a word similarity benchmark (MTurk-771) computed using this toolkit. Again, please refer to the original paper for task descriptions.
If you use the embeddings in your work, please consider citing:
@inproceedings{chung2018speech2vec,
title = {Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech},
author = {Chung, Yu-An and Glass, James},
booktitle = {INTERSPEECH},
year = {2018}
}