Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ok nabu v2 model #31

Open
tringuyen-olli opened this issue Oct 14, 2024 · 3 comments
Open

ok nabu v2 model #31

tringuyen-olli opened this issue Oct 14, 2024 · 3 comments

Comments

@tringuyen-olli
Copy link

Hello,
I trained the Ok nabu as your main branch, but the model is not same the Ok Nabu v2 as you release.
Is it the inception model (It outputs the model 115KB)?
Can you point me the model (or the branch) you use to train the Ok nabu v2 (model 60 KB)?
Thank you

@kahrendt
Copy link
Owner

The newer V2 models use a MixedNet architecture which uses MixConvs, where the channels are split into separate depthwise convolutions of different sizes. The best branch to work off of is 2024-06-14-improvements. The V2 model also use a 10 ms step size instead of the original 20 ms, but the initial convolution layer has a stride of 3, so it only infers every 30 ms. The new architecture along with the step size does make smaller, faster, and more accurate models.

Just a note for everyone, I'm sorry about not giving microWakeWord a lot of love recently! Since joining Nabu Casa in June, I've mainly been working on our upcoming voice hardware. As I make progress merging the various new components into ESPHome itself, I will hopefully have more time to work on improving mWW and making it easier for everyone to use.

@tringuyen-olli
Copy link
Author

tringuyen-olli commented Oct 16, 2024

Hello,
I tried to build the model from the 2024-06-14-improvements branch with some mofidication

    if flags.first_conv_filters > 0:
        net = stream.Stream(
            cell=tf.keras.layers.Conv2D(
                flags.first_conv_filters,
                (3, 1),
                strides=(3, 1),
                padding="valid",
                use_bias=False,
            ),
            use_one_step=False,
            pad_time_dim=None,
            pad_freq_dim="valid",
        )(net)

        net = tf.keras.layers.Activation("relu")(net)

in the mixednet.py file. The train cmd I used

python -m microwakeword.model_train_eval \
--training_config='notebooks/training_parameters.yaml' \
--train 1 \
--restore_checkpoint 1 \
--test_tf_nonstreaming 0 \
--test_tflite_nonstreaming 0 \
--test_tflite_streaming 1 \
--test_tflite_streaming_quantized 1 \
--use_weights "best_weights" \
mixednet \
--pointwise_filters "64,64,64,64" \
--repeat_in_block  "1, 1, 1, 1" \
--mixconv_kernel_sizes '[5], [9], [13], [21]' \
--residual_connection "0,0,0,0" \
--first_conv_filters 32

But the output model is not the same OK nabu
Here is the Ok nabu model
image
Here my model
image
Is my modification not right?

@kahrendt
Copy link
Owner

Ah I see, that code is a bit outdated and doesn't match the model itself. Sorry about that! The initial convolution had width 5 for the v2 models.

Try this if you want it to be generic (this is how I have it setup in my current local branch):

    if flags.first_conv_filters > 0:
        net = stream.Stream(
            cell=tf.keras.layers.Conv2D(
                flags.first_conv_filters,
                (flags.first_conv_kernel_size, 1),
                strides=(flags.stride, 1),
                padding="valid",
                use_bias=False,
            ),
            use_one_step=False,
            pad_time_dim=None,
            pad_freq_dim="valid",
        )(net)

Just pass --first_conv_kernel_size 5 and --stride 3 when calling model_train_eval in the command line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants