Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sound detection - Is this possible? (TFMIC-16) #74

Open
gamename opened this issue Feb 26, 2024 · 14 comments
Open

Sound detection - Is this possible? (TFMIC-16) #74

gamename opened this issue Feb 26, 2024 · 14 comments

Comments

@gamename
Copy link

Hi,

I'm using an esp32-s3-eye v2.2. It has 8MB each of flash and PSRAM. Is it possible to use yamnet.tflite on an esp32-s3-eye v2.2 for sound identification? The yamnet.tflite file is about 3.9M in size.

The chip has an sd card slot, so I can use it to load the model file (i.e. no need to convert it to a .cc file with xxd.

Thoughts?

@github-actions github-actions bot changed the title Sound detection - Is this possible? Sound detection - Is this possible? (TFMIC-16) Feb 26, 2024
@vikramdattu
Copy link
Collaborator

@gamename going by the size of the model, it is a quantised model I believe. If not, I would suggest you to quantise it to int8 weights. That'll reduce the size of the model to about 1/4th. Have you tried it with .cc first? Converting to .cc doesn't really increase the size of the model when the file get's embedded into the application, if you are concerned about that. The .cc file looks larger but the array size it puts the model in is still smaller.

About SD card, unfortunately, I have not really tried this approach. It definitely is a worth of a try IMO. When loading from the SD card however, it makes sense to not convert to .cc as you suggest.

Let me know how it goes. If you need further help or want me to try, do let me know.

@gamename
Copy link
Author

gamename commented Feb 29, 2024

@vikramdattu

What process did you use to build the yes_micro_features_data.cc file? I'm not referring to the xxd conversion. I'm referring to everything up to that. :)

The reason I ask is the C array in yes_micro_features_data.cc is tiny. I would like to replicate that size for my cat meow identification too.

Thanks
-T

@vikramdattu
Copy link
Collaborator

Hi @gamename this is test data and I had taken it long back from googles's tflite-micro.
Currently, the feature generation happens via a different model in this file and the features are then fed to detection model.

The tools here can help you train your own model, evaluate it and convert it.

@gamename
Copy link
Author

gamename commented Mar 1, 2024

Hi @gamename this is test data and I had taken it long back from googles's tflite-micro. Currently, the feature generation happens via a different model in this file and the features are then fed to detection model.

The tools here can help you train your own model, evaluate it and convert it.

Thank you, sir.

@gamename
Copy link
Author

gamename commented Mar 8, 2024

@vikramdattu

Is your pre-processor model taken from here?

The reason I ask is because it seems the pre-processor should work for "meow" as well as human speech. It just generates spectrograms. That's payload agnostic (i.e., it just makes a spectrogram of a sound and doesn't care what sound it is). Correct?

Thanks,
-T

@vikramdattu
Copy link
Collaborator

@gamename that's right, the model is taken from that particular location.

@gamename
Copy link
Author

gamename commented Mar 9, 2024

@gamename that's right, the model is taken from that particular location.

Perfect. Thanks.

@gamename
Copy link
Author

@vikramdattu

for micro_speech example, what is the purpose of having yes_micro_features_data.cc/h and no_micro_features_data.cc/h in the directory? Are they there for reference? They don't seem to be used - or am I missing something?

@vikramdattu
Copy link
Collaborator

@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.

@gamename
Copy link
Author

@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.

Thanks!

@gamename
Copy link
Author

@vikramdattu

This concerns building the actual model. I am using a script here that is just a compilation of the steps outlined here.

Here is what my input dir with samples looks like:

tree ./samples
./samples
├── _background_noise_
│   ├── README.md
│   ├── doing_the_dishes.wav
│   ├── dude_miaowing.wav
│   ├── exercise_bike.wav
│   ├── pink_noise.wav
│   ├── running_tap.wav
│   └── white_noise.wav
└── meow
    ├── cat0001.wav
    ├── cat0002.wav
    ...

(there are 77 total cat .wav files)

I'm confused about what needs to be in there. Do I need to add a silence and unknown subdir (with contents) as well?

Thanks
-T

@gamename
Copy link
Author

@vikramdattu

Another question. :)

Looking at this construct:

constexpr int kCategoryCount = 4;
constexpr const char* kCategoryLabels[kCategoryCount] = {
    "silence",
    "unknown",
    "yes",
    "no",
};```

...how do you know what the order of the labels ("silence", "unknown", etc) should be?  How is that set?




@vikramdattu
Copy link
Collaborator

vikramdattu commented Mar 14, 2024 via email

@gamename
Copy link
Author

That completely depends on the model trained. It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size.

That's not quite what I am asking. :)

My question is this: How do I know the order of the labels as they are used in python after the model has been created?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants