-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prediction like 0.00390625 #35
Comments
The predict_clip function uses the TF libraries to generate the spectrogram features. The problem is, this library doesn't handle a streaming setup; it resets the state with each call. Since the spectrogram features have some built in noise suppression and AGC, resetting the state completely messes this up, and so it isn't a surprise that the model doesn't detect it. https://github.com/OHF-Voice/pymicro-wakeword has a way to perform streaming inference by not using the Tensorflow libraries to generate the features. You should be able to adapt the code there to fit it into the code you posted. |
It's important to note that the model processes 3 windows of features, then moves 3 windows forward instead of just 1. |
Thank you! It works! However, I have some sad news. |
Just a guess, but I remember having issues with the Javascript version of onnxruntime too because it defaulted to the wrong datatype for tensors (float64?). Maybe TFLite is similar, and you have to be more explicit with the datatypes? |
Thank you. Since tfjs doesn’t support int8 type, I adjusted the range to -128 to 127 and used int32 instead. The main difference is that in Python, I used a [1,3,40] int8 tensor with all elements set to -128 as the input, whereas in JavaScript, I used a [1,3,40] int32 tensor with all elements set to -128(I did see a console warning message saying 'int32 converting int8,'). The input should essentially be the same. It was surprising to see different results despite this. Now that I think about it, it’s possible that the JavaScript TFLite might have internally modified the input when showing the 'int32 converting int8' message. Anyway, thank you for helping make it work well in Python. |
When I run inference as shown below, no matter what I say, whether it’s ‘Alexa,’ ‘Hey Jarvis,’ or ‘Okay Nabu,’ the prediction always comes out like this: [0.00390625, 0.00390625, 0.0078125].
`
from microwakeword.inference import Model
import pyaudio
import queue
import numpy as np
import time
def callback(in_data, frame_count, time_info, status):
listen_queue.put(in_data)
return None, pyaudio.paContinue
listen_p = pyaudio.PyAudio()
listen_stream = listen_p.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=1280,
stream_callback=callback)
model = Model("microwakeword/models/okay_nabu.tflite")
listen_queue = queue.Queue()
while True:
if not listen_queue.empty():
chunk = listen_queue.get()
data = np.frombuffer(chunk, dtype=np.int16)
start_time = time.time()
output = model.predict_clip(data)
print("Time taken: ", time.time() - start_time)
print(output)
`
What am I doing wrong?
The text was updated successfully, but these errors were encountered: