Gradio demo for real-time conversations with WebRTC #150

freddyaboulton · 2024-11-15T21:12:12Z

This PR adds a gradio demo for real-time conversations with the latest ultravox model. The gradio demo leverages the WebRTC custom component for low-latency audio streaming both locally and on remote servers like EC2 and huggingface spaces.

You can see the demo running here

ultravox-demo.mp4

Key Features:

WebRTC for low latency streaming no matter where the demo runs
Automatic Voice Detection: Once a pause is detected, audio is passed to the custom python function to generate a response
Intuitive UI: The entire conversation is displayed in a chatbot UI
Inference Example: Shows how to run inference for the model (Add basic Jupyter notebook for inference #11)

Improvements (need help from community/model authors!):

Its unclear (to me at least) how to stream outputs from the transformers pipeline. Streaming the output can make the demo look like it's running faster by reducing the time until output is shown. Changing the gradio demo to stream the text is trivial once I know how to stream the text from the pipeline.
It's unclear how to properly handle multi-turn audio conversations with this model. So I'm using whisper to transcribe the user input so that I can store it in the chat history for the next turn. This is increasing latency so I would like to get rid of it if possible.

eschmidbauer · 2024-11-20T20:37:45Z

@freddyaboulton just curious- why are you using whisper to transcribe?
ultravox is capable of taking audio as input. see ultravox/tools/gradio_demo.py for example

freddyaboulton · 2024-11-22T02:20:27Z

Hi @eschmidbauer ! The audio is passed directly to ultravox in my demo but I used whisper to pass the previous audio prompts as text in the turns parameter. I was going by this code and it seems the turns can only be text. In the demo you linked, it seems that only the current audio message is taken into consideration and the previous audio messages are not used to generate a response. Is that correct? Please correct me if I'm wrong. I'll be happy to modify the demo to follow the best practices!

freddyaboulton · 2024-11-22T02:21:27Z

BTW the issue with reload mode loading models twice has since been fixed!

eschmidbauer · 2024-11-22T14:28:28Z

@freddyaboulton
Do you still need to transcribe and add it to the turn when conversation_mode=True

https://github.com/fixie-ai/ultravox/blob/main/ultravox/tools/gradio_helper.py#L15C13-L15C36

farzadab · 2024-11-26T22:45:42Z

Thanks @freddyaboulton!

Two quick replies to your questions:

Its unclear (to me at least) how to stream outputs from the transformers pipeline

AFAIU the pipeline abstraction is not designed for streaming use cases. I might be wrong though. I'll take a look. In the meantime you can take a look at infer_tool which supports both batched and streaming modes: https://github.com/fixie-ai/ultravox/blob/main/ultravox/tools/infer_tool.py#L83

We've had other issues with pipeline as well, for example regarding batched processing, so the current pipeline implementation is lacking in many ways.

It's unclear how to properly handle multi-turn audio conversations with this model.

Yes, the current pipeline implementation doesn't support this and it's long overdue. I can spend some time on this as soon as I can.

zqhuang211 · 2024-12-03T04:33:34Z

@freddyaboulton Thanks for submitting this PR—it looks great! Regarding your two questions about streaming output and multi-turn conversation, both are supported in the Gradio demo implemented in ultravox/tools/gradio_demo.py (which requires start/stop recording for each user audio input). I was wondering if it might be possible to adapt some of the ideas from that demo into your Gradio demo?

freddyaboulton · 2024-12-03T22:13:34Z

Hi @zqhuang211 ! Yes that is a great plan - will update my demo this week :)

freddyaboulton · 2024-12-07T00:03:02Z

Hi @zqhuang211 , @farzadab, @eschmidbauer - I have updated the demo to use the infer_tool. You can run it with poetry run python ultravox/tools/gradio_demo.py --voice_mode=True

freddyaboulton · 2024-12-07T00:16:47Z

2024-12-06.16-15-56.mp4

zqhuang211

This looks good. Thank you!

I will make some minor changes from my end.

zqhuang211 · 2024-12-12T18:22:31Z

@freddyaboulton there are some minor formatting issues. Can you run just check and just format to make sure it can pass the tests? I will merge it afterwards. Thanks!

freddyaboulton · 2024-12-12T19:40:04Z

Should be fixed @zqhuang211 -thanks!

Add code

0d6c4ed

zkoch requested review from farzadab, zqhuang211 and liPatrick November 26, 2024 06:07

freddyaboulton and others added 6 commits December 6, 2024 14:11

voice mode

9fa0679

code

e533d0a

fix placeholder

cb6e247

fix audio

b79f73b

make work

ad87d35

README

6f3b26e

Fix command

5eb411d

zqhuang211 approved these changes Dec 11, 2024

View reviewed changes

Add code

7736469

farzadab merged commit 87acdb8 into fixie-ai:main Dec 12, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradio demo for real-time conversations with WebRTC #150

Gradio demo for real-time conversations with WebRTC #150

freddyaboulton commented Nov 15, 2024

eschmidbauer commented Nov 20, 2024

freddyaboulton commented Nov 22, 2024

freddyaboulton commented Nov 22, 2024

eschmidbauer commented Nov 22, 2024 •

edited

Loading

farzadab commented Nov 26, 2024 •

edited

Loading

zqhuang211 commented Dec 3, 2024

freddyaboulton commented Dec 3, 2024

freddyaboulton commented Dec 7, 2024 •

edited

Loading

freddyaboulton commented Dec 7, 2024

zqhuang211 left a comment •

edited

Loading

zqhuang211 commented Dec 12, 2024

freddyaboulton commented Dec 12, 2024

Gradio demo for real-time conversations with WebRTC #150

Gradio demo for real-time conversations with WebRTC #150

Conversation

freddyaboulton commented Nov 15, 2024

eschmidbauer commented Nov 20, 2024

freddyaboulton commented Nov 22, 2024

freddyaboulton commented Nov 22, 2024

eschmidbauer commented Nov 22, 2024 • edited Loading

farzadab commented Nov 26, 2024 • edited Loading

zqhuang211 commented Dec 3, 2024

freddyaboulton commented Dec 3, 2024

freddyaboulton commented Dec 7, 2024 • edited Loading

freddyaboulton commented Dec 7, 2024

zqhuang211 left a comment • edited Loading

Choose a reason for hiding this comment

zqhuang211 commented Dec 12, 2024

freddyaboulton commented Dec 12, 2024

eschmidbauer commented Nov 22, 2024 •

edited

Loading

farzadab commented Nov 26, 2024 •

edited

Loading

freddyaboulton commented Dec 7, 2024 •

edited

Loading

zqhuang211 left a comment •

edited

Loading