prediction like 0.00390625 #35

pilkyu95 · 2024-10-28T14:36:12Z

When I run inference as shown below, no matter what I say, whether it’s ‘Alexa,’ ‘Hey Jarvis,’ or ‘Okay Nabu,’ the prediction always comes out like this: [0.00390625, 0.00390625, 0.0078125].

`
from microwakeword.inference import Model
import pyaudio
import queue
import numpy as np
import time

def callback(in_data, frame_count, time_info, status):
listen_queue.put(in_data)
return None, pyaudio.paContinue
listen_p = pyaudio.PyAudio()
listen_stream = listen_p.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=1280,
stream_callback=callback)

model = Model("microwakeword/models/okay_nabu.tflite")

listen_queue = queue.Queue()

while True:
if not listen_queue.empty():
chunk = listen_queue.get()
data = np.frombuffer(chunk, dtype=np.int16)
start_time = time.time()
output = model.predict_clip(data)
print("Time taken: ", time.time() - start_time)
print(output)
`

What am I doing wrong?

kahrendt · 2024-10-28T17:33:05Z

The predict_clip function uses the TF libraries to generate the spectrogram features. The problem is, this library doesn't handle a streaming setup; it resets the state with each call. Since the spectrogram features have some built in noise suppression and AGC, resetting the state completely messes this up, and so it isn't a surprise that the model doesn't detect it.

https://github.com/OHF-Voice/pymicro-wakeword has a way to perform streaming inference by not using the Tensorflow libraries to generate the features. You should be able to adapt the code there to fit it into the code you posted.

synesthesiam · 2024-10-28T19:44:20Z

It's important to note that the model processes 3 windows of features, then moves 3 windows forward instead of just 1.

pilkyu95 · 2024-10-29T15:07:52Z

Thank you! It works!

However, I have some sad news.
I attempted to run microword on the web using JavaScript.
Referring to pymicro-wakeword, I successfully bound micro_frontend as a JavaScript module. When using the same audio buffer as input, I get the same output as the Python module.
However, there was an issue where the output differed from Python when running the microwakeword tflite model on the web. I generated temporary test data with a shape of [1,3,40] and values ranging from -128 to 127, and even though the input was the same, the outputs were different between the Python and JavaScript inferences.
So, wakeword detection works in Python, but it doesn’t work in JavaScript.
Do you happen to know anything about this issue?

synesthesiam · 2024-10-30T13:29:06Z

Just a guess, but I remember having issues with the Javascript version of onnxruntime too because it defaulted to the wrong datatype for tensors (float64?). Maybe TFLite is similar, and you have to be more explicit with the datatypes?

pilkyu95 · 2024-10-31T07:42:19Z

Thank you. Since tfjs doesn’t support int8 type, I adjusted the range to -128 to 127 and used int32 instead. The main difference is that in Python, I used a [1,3,40] int8 tensor with all elements set to -128 as the input, whereas in JavaScript, I used a [1,3,40] int32 tensor with all elements set to -128(I did see a console warning message saying 'int32 converting int8,'). The input should essentially be the same. It was surprising to see different results despite this.

Now that I think about it, it’s possible that the JavaScript TFLite might have internally modified the input when showing the 'int32 converting int8' message.

Anyway, thank you for helping make it work well in Python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prediction like 0.00390625 #35

prediction like 0.00390625 #35

pilkyu95 commented Oct 28, 2024 •

edited

Loading

kahrendt commented Oct 28, 2024

synesthesiam commented Oct 28, 2024

pilkyu95 commented Oct 29, 2024 •

edited

Loading

synesthesiam commented Oct 30, 2024

pilkyu95 commented Oct 31, 2024

prediction like 0.00390625 #35

prediction like 0.00390625 #35

Comments

pilkyu95 commented Oct 28, 2024 • edited Loading

kahrendt commented Oct 28, 2024

synesthesiam commented Oct 28, 2024

pilkyu95 commented Oct 29, 2024 • edited Loading

synesthesiam commented Oct 30, 2024

pilkyu95 commented Oct 31, 2024

pilkyu95 commented Oct 28, 2024 •

edited

Loading

pilkyu95 commented Oct 29, 2024 •

edited

Loading