Sound detection - Is this possible? (TFMIC-16) #74

gamename · 2024-02-26T22:24:14Z

Hi,

I'm using an esp32-s3-eye v2.2. It has 8MB each of flash and PSRAM. Is it possible to use yamnet.tflite on an esp32-s3-eye v2.2 for sound identification? The yamnet.tflite file is about 3.9M in size.

The chip has an sd card slot, so I can use it to load the model file (i.e. no need to convert it to a .cc file with xxd.

Thoughts?

The text was updated successfully, but these errors were encountered:

vikramdattu · 2024-02-27T05:46:36Z

@gamename going by the size of the model, it is a quantised model I believe. If not, I would suggest you to quantise it to int8 weights. That'll reduce the size of the model to about 1/4th. Have you tried it with .cc first? Converting to .cc doesn't really increase the size of the model when the file get's embedded into the application, if you are concerned about that. The .cc file looks larger but the array size it puts the model in is still smaller.

About SD card, unfortunately, I have not really tried this approach. It definitely is a worth of a try IMO. When loading from the SD card however, it makes sense to not convert to .cc as you suggest.

Let me know how it goes. If you need further help or want me to try, do let me know.

gamename · 2024-02-29T18:34:21Z

@vikramdattu

What process did you use to build the yes_micro_features_data.cc file? I'm not referring to the xxd conversion. I'm referring to everything up to that. :)

The reason I ask is the C array in yes_micro_features_data.cc is tiny. I would like to replicate that size for my cat meow identification too.

Thanks
-T

vikramdattu · 2024-03-01T04:59:56Z

Hi @gamename this is test data and I had taken it long back from googles's tflite-micro.
Currently, the feature generation happens via a different model in this file and the features are then fed to detection model.

The tools here can help you train your own model, evaluate it and convert it.

gamename · 2024-03-01T19:13:14Z

Hi @gamename this is test data and I had taken it long back from googles's tflite-micro. Currently, the feature generation happens via a different model in this file and the features are then fed to detection model.

The tools here can help you train your own model, evaluate it and convert it.

Thank you, sir.

gamename · 2024-03-08T18:03:03Z

@vikramdattu

Is your pre-processor model taken from here?

The reason I ask is because it seems the pre-processor should work for "meow" as well as human speech. It just generates spectrograms. That's payload agnostic (i.e., it just makes a spectrogram of a sound and doesn't care what sound it is). Correct?

Thanks,
-T

vikramdattu · 2024-03-09T05:43:20Z

@gamename that's right, the model is taken from that particular location.

gamename · 2024-03-09T12:46:46Z

@gamename that's right, the model is taken from that particular location.

Perfect. Thanks.

gamename · 2024-03-10T16:55:25Z

@vikramdattu

for micro_speech example, what is the purpose of having yes_micro_features_data.cc/h and no_micro_features_data.cc/h in the directory? Are they there for reference? They don't seem to be used - or am I missing something?

vikramdattu · 2024-03-11T05:51:31Z

@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.

gamename · 2024-03-11T10:28:19Z

@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.

Thanks!

gamename · 2024-03-11T11:39:38Z

@vikramdattu

This concerns building the actual model. I am using a script here that is just a compilation of the steps outlined here.

Here is what my input dir with samples looks like:

tree ./samples
./samples
├── _background_noise_
│   ├── README.md
│   ├── doing_the_dishes.wav
│   ├── dude_miaowing.wav
│   ├── exercise_bike.wav
│   ├── pink_noise.wav
│   ├── running_tap.wav
│   └── white_noise.wav
└── meow
    ├── cat0001.wav
    ├── cat0002.wav
    ...

(there are 77 total cat .wav files)

I'm confused about what needs to be in there. Do I need to add a silence and unknown subdir (with contents) as well?

Thanks
-T

gamename · 2024-03-13T21:46:49Z

@vikramdattu

Another question. :)

Looking at this construct:

constexpr int kCategoryCount = 4;
constexpr const char* kCategoryLabels[kCategoryCount] = {
    "silence",
    "unknown",
    "yes",
    "no",
};```

...how do you know what the order of the labels ("silence", "unknown", etc) should be?  How is that set?

vikramdattu · 2024-03-14T04:34:59Z

Hello TennisSmith, That completely depends on the model trained. It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size. Thanks., Vikram On 14-Mar-2024, at 3:17 AM, Tennis Smith ***@***.***> wrote: [External: This email originated outside Espressif] @vikramdattu<https://github.com/vikramdattu> Another question. :) Looking at this construct<https://github.com/espressif/esp-tflite-micro/blob/61af88b7b30fda2078a7b52b2c1b600899a73e2e/examples/micro_speech/main/micro_model_settings.h#L31>: constexpr int kCategoryCount = 4; constexpr const char* kCategoryLabels[kCategoryCount] = { "silence", "unknown", "yes", "no", };``` ...how do you know what the order of the labels ("silence", "unknown", etc) should be? How is that set? — Reply to this email directly, view it on GitHub<#74 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABKBURYJORLN5LS6HFAZIN3YYDCN7AVCNFSM6AAAAABD3ASOXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVHEYTSNBXHA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

gamename · 2024-03-14T16:48:19Z

That completely depends on the model trained. It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size.

That's not quite what I am asking. :)

My question is this: How do I know the order of the labels as they are used in python after the model has been created?

github-actions bot changed the title ~~Sound detection - Is this possible?~~ Sound detection - Is this possible? (TFMIC-16) Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sound detection - Is this possible? (TFMIC-16) #74

Sound detection - Is this possible? (TFMIC-16) #74

gamename commented Feb 26, 2024

vikramdattu commented Feb 27, 2024

gamename commented Feb 29, 2024 •

edited

Loading

vikramdattu commented Mar 1, 2024

gamename commented Mar 1, 2024

gamename commented Mar 8, 2024 •

edited

Loading

vikramdattu commented Mar 9, 2024

gamename commented Mar 9, 2024

gamename commented Mar 10, 2024

vikramdattu commented Mar 11, 2024

gamename commented Mar 11, 2024

gamename commented Mar 11, 2024

gamename commented Mar 13, 2024

vikramdattu commented Mar 14, 2024 via email

gamename commented Mar 14, 2024

Sound detection - Is this possible? (TFMIC-16) #74

Sound detection - Is this possible? (TFMIC-16) #74

Comments

gamename commented Feb 26, 2024

vikramdattu commented Feb 27, 2024

gamename commented Feb 29, 2024 • edited Loading

vikramdattu commented Mar 1, 2024

gamename commented Mar 1, 2024

gamename commented Mar 8, 2024 • edited Loading

vikramdattu commented Mar 9, 2024

gamename commented Mar 9, 2024

gamename commented Mar 10, 2024

vikramdattu commented Mar 11, 2024

gamename commented Mar 11, 2024

gamename commented Mar 11, 2024

gamename commented Mar 13, 2024

vikramdattu commented Mar 14, 2024 via email

gamename commented Mar 14, 2024

gamename commented Feb 29, 2024 •

edited

Loading

gamename commented Mar 8, 2024 •

edited

Loading