GGML is an open source library in C to enable large model inference on various hardware platforms. We implemented unity.cpp in ggml. Now it supports SeamlessM4T model for X2T tasks - Speech-to-text translation (S2TT), Acoustic speech recognition (ASR), Text-to-text translation (T2TT).
The project is still active in development. Contributions are welcome!
To build the interactive console for S2TT & ASR & T2TT,
cd seamless_communication/ggml
mkdir build; cd build
cmake -DGGML_OPENBLAS=ON \
-DBUILD_SHARED_LIBS=On \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS="-g2 -fno-omit-frame-pointer" \
..
make -j4 unity # Interactive Console
For more build commands see Makefile.
Command to launch an interactive console for S2TT & ASR, note that the model already includes vocabulary needed to detokenize.
OPENBLAS_NUM_THREADS=8 ./bin/unity --model seamlessM4T_medium.ggml
In the console, enter "wav_file tgt_lang" - the path of local waveform file and target language, separated by space. Note that the first run would include some “warm up” time so could be slow.
Launching command:
OPENBLAS_NUM_THREADS=8 ./bin/unity --model nllb-200_dense_1b.ggml --text
In the console, enter "input_text tgt_lang" - input text and target langauge, separated by space. Note that the language code should align with NLLB BCP-47 code, NOT 3-letter language code as S2TT task with Seamless. Unifying this is on todo list.
Converted ggml models could be downloaded from
SeamlessM4T_large | SeamlessM4T_medium | NLLB_dense_1b | NLLB_distill_600m |
---|---|---|---|
model | model | model | model |
For more details of NLLB models, please check https://github.com/facebookresearch/fairseq/tree/nllb.
Models from fairseq2 checkpoints could be converted to ggml automatically with ggml_convert.py.
python ggml_convert.py -m MODEL_NAME
where MODEL_NAME corresponds to asset cards in fairseq2 / seamless_communication, e.g. seamlessM4T_medium, seamlessM4T_large
We also utilize ggml python bindings for better dev experience. For examples of running unity.cpp in python, refer to tests in test_unity_cpp.py.
We strongly suggest building with OpenBLAS, as we've seen 8x speedup on test machine.
This is needed only for the console to load waveform, but not the library.