How to Adjust chunk_size for 1-Second Audio Input in Cache-Aware Streaming ASR Demo? #11581

june-oh · 2024-12-13T16:25:11Z

june-oh
Dec 13, 2024

Hi, I’m using the Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb notebook and referring to the following code snippet:

with torch.no_grad():
    (
        pred_out_stream,
        transcribed_texts,
        cache_last_channel,
        cache_last_time,
        cache_last_channel_len,
        previous_hypotheses,
    ) = asr_model.conformer_stream_step(
        processed_signal=processed_signal,
        processed_signal_length=processed_signal_length,
        cache_last_channel=cache_last_channel,
        cache_last_time=cache_last_time,
        cache_last_channel_len=cache_last_channel_len,
        keep_all_outputs=False,
        previous_hypotheses=previous_hypotheses,
        previous_pred_out=pred_out_stream,
        drop_extra_pre_encoded=None,
        return_transcription=True,
    )

In this demo, the chunk_size is defined as chunk_size = lookahead_size + ENCODER_STEP_LENGTH, but the default setting results in chunks that are too small for my use case. I’d like to process the audio input in approximately 1-second increments.

However, simply changing ENCODER_STEP_LENGTH to 1000 didn’t produce the expected results. Could anyone provide guidance on how to properly adjust the chunk_size or any other relevant parameters to achieve this? Any detailed explanation or example would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Adjust chunk_size for 1-Second Audio Input in Cache-Aware Streaming ASR Demo? #11581

{{title}}

Replies: 0 comments

Select a reply

How to Adjust chunk_size for 1-Second Audio Input in Cache-Aware Streaming ASR Demo? #11581

june-oh Dec 13, 2024

Replies: 0 comments

june-oh
Dec 13, 2024