Skip to content

v0.4.1

Latest
Compare
Choose a tag to compare
@benlower benlower released this 12 Nov 22:48
· 5 commits to main since this release
812f58c

We're releasing Ultravox 0.4.1 today. The weights have been pushed to Hugging Face (along with updated datasets for training). If you're using the Ultravox Realtime APIs, v0.4.1 is the new default.

We'd love to hear feedback on your experience with Ultravox, along with feature suggestions.

What's New

v0.4.1 improves upon 0.4 in the following ways:

  • We've upgraded the Whisper encoder from Whisper-medium to Whisper-large-v3-turbo. This has led to quality improvements (see the table below).
  • We're adding six new languages: Chinese, Dutch, Hindi, Swedish, Turkish, and Ukrainian. That brings the total supported languages to 15 (see table below).
  • Increased the amount of training data for English.

15 Languages Supported

Language ISO Code
Arabic ar
Chinese zh
Dutch nl
English en
French fr
German de
Hindi hi
Italian it
Japanese ja
Portuguese pt
Russian ru
Spanish es
Swedish sv
Turkish tr
Ukrainian uk

Evals

Our primary method of evaluation is speech translation, measured by BLEU, as a proxy or general instruction-following capability (the higher the number the better). ca is an example of model performance for languages not included in training.

Ultravox 70B

Ultravox 0.4 70B Ultravox 0.4.1 70B
en_ar 14.97 19.64
en_de 30.30 32.47
es_en 39.55 40.76
ru_en 44.16 45.07
en_ca 35.02 37.58
zh_en 12.16 17.98

Ultravox 8B

Ultravox 0.4 8B Ultravox 0.4.1 8B
en_ar 11.17 12.28
en_de 25.47 27.13
es_en 37.11 39.16
ru_en 38.96 39.65
en_ca 27.46 29.94
zh_en 10.08 14.55

Training

This version of Ultravox continues to use a frozen Llama 3.1 pre-trained core (for both 8B and 70B), but we've significantly increased the size of the data and the overall training time. The speech adapter was trained on >10k hours of multilingual speech data. The training time on 8xH100s is about 24 hours for the 8B model and 3 days for the 70B model.

What's Changed

New Contributors

Full Changelog: v0.4...v0.4.1