Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

josancamon19 · 2024-08-04T06:23:39Z

Is your feature request related to a problem? Please describe.
VAD needs to determine better when to send or not to send bytes.

File transcribe.py /listen endpoint.

while True:
    data = await websocket.receive_bytes()
    # print(len(data))
    # audio_buffer.extend(data)
    # print(len(audio_buffer), window_size_samples * 2) # * 2 because 16bit
    # TODO: vad not working propperly.
    # - PCM still has to collect samples, and while it collects them, still sends them to the socket, so it's like nothing
    # - Opus always says there's no speech (but collection doesn't matter much, as it triggers like 1 per 0.2 seconds)

    # len(data) = 160, 8khz 16bit -> 2 bytes per sample, 80 samples, needs 256 samples, which is 256*2 bytes
    # if len(audio_buffer) >= window_size_samples * 2:
    #     # TODO: vad doesn't work index.html
    #     if is_speech_present(audio_buffer[:window_size_samples * 2], vad_iterator, window_size_samples):
    #         print('*')
    #         # pass
    #     else:
    #         print('-')
    #         audio_buffer = audio_buffer[window_size_samples * 2:]
    #         continue
    #
    #     audio_buffer = audio_buffer[window_size_samples * 2:]

    elapsed_seconds = time.time() - timer_start
    if elapsed_seconds > 20 or not socket2:
        socket1.send(data)
        # print('Sending to socket 1')
        if socket2:
            print('Killing transcript_socket2')
            socket2.finish()
            socket2 = None
    else:
        # print('Sending to socket 2')
        socket2.send(data)

Describe the solution you'd like
Opus 16k 16 bit.
pcm8 for old firmware version. 8khz.
pcm16 for from device recording.

This requires also to work with multiple languages.

The text was updated successfully, but these errors were encountered:

0xzre · 2024-08-08T07:50:25Z

Hello, I'll gladly take this issue. My plan is:

Integrate VAD: I will incorporate the 'silero-vad' library, which is well suited for Friend device, for better voice activity detection.

Adjust Audio Buffer Handling: I'll refine the handling of audio data, managing the buffer size and ensuring that it correctly handle different audio formats, such as Opus and PCM.

Sample Rate and Codec Handling: I'll try to involve adjusting the VAD parameters and buffer calculations based on the specified sample rate and codec.

Looking forward for reply fren :)

josancamon19 · 2024-08-08T08:11:28Z

Awesome! assigning to @0xzre for the next 2 days.

Some context of what is in place already:
https://github.com/BasedHardware/Friend/blob/main/backend/utils/stt/vad.py
https://github.com/BasedHardware/Friend/blob/272b663b0d86832e56a0ccea3656b7f372e8361a/backend/routers/transcribe.py#L66

josancamon19 · 2024-08-09T22:44:12Z

Hi @0xzre can you submit a Draft PR and show progress?

mdmohsin7 · 2024-08-19T08:46:20Z

Fixed for pcm8 and pcm16. Opus is still pending

beastoin · 2024-09-24T22:17:08Z

$500 🤑 should i...

josancamon19 · 2024-09-27T03:12:51Z

Ended up implementing a shitty** VAD
https://github.com/wiseman/py-webrtcvad/blob/master/example.py

Still does the 80/20.
Tried implementing the VAD on the front
https://github.com/BasedHardware/Friend/blob/6e2d9903b493681673a93aa39f392228ababb660/app/lib/providers/vad.dart#L13
Consumes 10% more battery on iPhone 11, than just sending the bytes to the websocket.
It is still maintainable IMO, and if lower on celullar data, is great.

Will keep in backlog, but if silero becomes a viable solution, will merge that solution, and take it to prod, the baseline, is the current implementation, has to be at least at good at discarding, but also at most worst on delaying the transcript.

josancamon19 added Paid Bounty 💰 backend Backend Task (python) labels Aug 4, 2024

kodjima33 added this to omi TODO Aug 4, 2024

kodjima33 moved this to Backlog in omi TODO Aug 4, 2024

josancamon19 changed the title ~~Websocket silero VAD works for (opus, pcm8, pcm16) ($250)~~ Websocket silero VAD works for (opus, pcm8, pcm16) ($500) Aug 6, 2024

josancamon19 assigned 0xzre Aug 8, 2024

0xzre mentioned this issue Aug 10, 2024

fix ws VAD for codec Opus, pcm8, pcm16 #565

Merged

mdmohsin7 closed this as completed in #565 Aug 19, 2024

mdmohsin7 closed this as completed in bdd6c71 Aug 19, 2024

github-project-automation bot moved this from Backlog to Done in omi TODO Aug 19, 2024

mdmohsin7 reopened this Aug 19, 2024

0xzre mentioned this issue Aug 19, 2024

reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe #624

Closed

josancamon19 moved this from Done to In progress in omi TODO Aug 24, 2024

beastoin unassigned 0xzre Oct 18, 2024

beastoin removed the status in omi TODO Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

josancamon19 commented Aug 4, 2024

0xzre commented Aug 8, 2024

josancamon19 commented Aug 8, 2024

josancamon19 commented Aug 9, 2024

mdmohsin7 commented Aug 19, 2024

beastoin commented Sep 24, 2024

josancamon19 commented Sep 27, 2024

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

Comments

josancamon19 commented Aug 4, 2024

0xzre commented Aug 8, 2024

josancamon19 commented Aug 8, 2024

josancamon19 commented Aug 9, 2024

mdmohsin7 commented Aug 19, 2024

beastoin commented Sep 24, 2024

josancamon19 commented Sep 27, 2024