Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prediction like 0.00390625 #35

Open
pilkyu95 opened this issue Oct 28, 2024 · 5 comments
Open

prediction like 0.00390625 #35

pilkyu95 opened this issue Oct 28, 2024 · 5 comments

Comments

@pilkyu95
Copy link

pilkyu95 commented Oct 28, 2024

When I run inference as shown below, no matter what I say, whether it’s ‘Alexa,’ ‘Hey Jarvis,’ or ‘Okay Nabu,’ the prediction always comes out like this: [0.00390625, 0.00390625, 0.0078125].

`
from microwakeword.inference import Model
import pyaudio
import queue
import numpy as np
import time

def callback(in_data, frame_count, time_info, status):
listen_queue.put(in_data)
return None, pyaudio.paContinue
listen_p = pyaudio.PyAudio()
listen_stream = listen_p.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=1280,
stream_callback=callback)

model = Model("microwakeword/models/okay_nabu.tflite")

listen_queue = queue.Queue()

while True:
if not listen_queue.empty():
chunk = listen_queue.get()
data = np.frombuffer(chunk, dtype=np.int16)
start_time = time.time()
output = model.predict_clip(data)
print("Time taken: ", time.time() - start_time)
print(output)
`

What am I doing wrong?

@kahrendt
Copy link
Owner

The predict_clip function uses the TF libraries to generate the spectrogram features. The problem is, this library doesn't handle a streaming setup; it resets the state with each call. Since the spectrogram features have some built in noise suppression and AGC, resetting the state completely messes this up, and so it isn't a surprise that the model doesn't detect it.

https://github.com/OHF-Voice/pymicro-wakeword has a way to perform streaming inference by not using the Tensorflow libraries to generate the features. You should be able to adapt the code there to fit it into the code you posted.

@synesthesiam
Copy link
Collaborator

It's important to note that the model processes 3 windows of features, then moves 3 windows forward instead of just 1.

@pilkyu95
Copy link
Author

pilkyu95 commented Oct 29, 2024

Thank you! It works!

However, I have some sad news.
I attempted to run microword on the web using JavaScript.
Referring to pymicro-wakeword, I successfully bound micro_frontend as a JavaScript module. When using the same audio buffer as input, I get the same output as the Python module.
However, there was an issue where the output differed from Python when running the microwakeword tflite model on the web. I generated temporary test data with a shape of [1,3,40] and values ranging from -128 to 127, and even though the input was the same, the outputs were different between the Python and JavaScript inferences.
So, wakeword detection works in Python, but it doesn’t work in JavaScript.
Do you happen to know anything about this issue?

@synesthesiam
Copy link
Collaborator

Just a guess, but I remember having issues with the Javascript version of onnxruntime too because it defaulted to the wrong datatype for tensors (float64?). Maybe TFLite is similar, and you have to be more explicit with the datatypes?

@pilkyu95
Copy link
Author

Thank you. Since tfjs doesn’t support int8 type, I adjusted the range to -128 to 127 and used int32 instead. The main difference is that in Python, I used a [1,3,40] int8 tensor with all elements set to -128 as the input, whereas in JavaScript, I used a [1,3,40] int32 tensor with all elements set to -128(I did see a console warning message saying 'int32 converting int8,'). The input should essentially be the same. It was surprising to see different results despite this.

Now that I think about it, it’s possible that the JavaScript TFLite might have internally modified the input when showing the 'int32 converting int8' message.

Anyway, thank you for helping make it work well in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants