ok nabu v2 model #31

tringuyen-olli · 2024-10-14T08:25:33Z

Hello,
I trained the Ok nabu as your main branch, but the model is not same the Ok Nabu v2 as you release.
Is it the inception model (It outputs the model 115KB)?
Can you point me the model (or the branch) you use to train the Ok nabu v2 (model 60 KB)?
Thank you

kahrendt · 2024-10-15T01:30:30Z

The newer V2 models use a MixedNet architecture which uses MixConvs, where the channels are split into separate depthwise convolutions of different sizes. The best branch to work off of is 2024-06-14-improvements. The V2 model also use a 10 ms step size instead of the original 20 ms, but the initial convolution layer has a stride of 3, so it only infers every 30 ms. The new architecture along with the step size does make smaller, faster, and more accurate models.

Just a note for everyone, I'm sorry about not giving microWakeWord a lot of love recently! Since joining Nabu Casa in June, I've mainly been working on our upcoming voice hardware. As I make progress merging the various new components into ESPHome itself, I will hopefully have more time to work on improving mWW and making it easier for everyone to use.

tringuyen-olli · 2024-10-16T09:02:35Z

Hello,
I tried to build the model from the 2024-06-14-improvements branch with some mofidication

    if flags.first_conv_filters > 0:
        net = stream.Stream(
            cell=tf.keras.layers.Conv2D(
                flags.first_conv_filters,
                (3, 1),
                strides=(3, 1),
                padding="valid",
                use_bias=False,
            ),
            use_one_step=False,
            pad_time_dim=None,
            pad_freq_dim="valid",
        )(net)

        net = tf.keras.layers.Activation("relu")(net)

in the mixednet.py file. The train cmd I used

python -m microwakeword.model_train_eval \
--training_config='notebooks/training_parameters.yaml' \
--train 1 \
--restore_checkpoint 1 \
--test_tf_nonstreaming 0 \
--test_tflite_nonstreaming 0 \
--test_tflite_streaming 1 \
--test_tflite_streaming_quantized 1 \
--use_weights "best_weights" \
mixednet \
--pointwise_filters "64,64,64,64" \
--repeat_in_block  "1, 1, 1, 1" \
--mixconv_kernel_sizes '[5], [9], [13], [21]' \
--residual_connection "0,0,0,0" \
--first_conv_filters 32

But the output model is not the same OK nabu
Here is the Ok nabu model

Here my model

Is my modification not right?

kahrendt · 2024-10-16T18:53:59Z

Ah I see, that code is a bit outdated and doesn't match the model itself. Sorry about that! The initial convolution had width 5 for the v2 models.

Try this if you want it to be generic (this is how I have it setup in my current local branch):

    if flags.first_conv_filters > 0:
        net = stream.Stream(
            cell=tf.keras.layers.Conv2D(
                flags.first_conv_filters,
                (flags.first_conv_kernel_size, 1),
                strides=(flags.stride, 1),
                padding="valid",
                use_bias=False,
            ),
            use_one_step=False,
            pad_time_dim=None,
            pad_freq_dim="valid",
        )(net)

Just pass --first_conv_kernel_size 5 and --stride 3 when calling model_train_eval in the command line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ok nabu v2 model #31

ok nabu v2 model #31

tringuyen-olli commented Oct 14, 2024

kahrendt commented Oct 15, 2024

tringuyen-olli commented Oct 16, 2024 •

edited

Loading

kahrendt commented Oct 16, 2024

ok nabu v2 model #31

ok nabu v2 model #31

Comments

tringuyen-olli commented Oct 14, 2024

kahrendt commented Oct 15, 2024

tringuyen-olli commented Oct 16, 2024 • edited Loading

kahrendt commented Oct 16, 2024

tringuyen-olli commented Oct 16, 2024 •

edited

Loading