Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-integer quantization and kapre layers #138

Open
eppane opened this issue Feb 15, 2022 · 3 comments
Open

Full-integer quantization and kapre layers #138

eppane opened this issue Feb 15, 2022 · 3 comments

Comments

@eppane
Copy link

eppane commented Feb 15, 2022

I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

I am using TensorFlow 2.7.0 and kapre 0.3.7.

Here is my code for testing the tflite-model:

preds = []
# Test and evaluate the TFLite-converted model on unseen test data
for i, sample in enumerate(X_test_full_scaled):
    X = sample
    
    if input_details['dtype'] == np.int8:
        input_scale, input_zero_point = input_details["quantization"]
        X = sample / input_scale + input_zero_point
    
    X = X.reshape((1, 8000, 1)).astype(input_details["dtype"])
    
    interpreter.set_tensor(input_index, X)
    interpreter.invoke()
    pred = interpreter.get_tensor(output_index)
    
    output_scale, output_zero_point = output_details['quantization']
    if output_details['dtype'] == np.int8:
        pred = pred.astype(np.float32)
        pred = (pred - output_zero_point) * output_scale
    
    pred = np.argmax(pred, axis=1)[0]
    preds.append(pred)

preds = np.array(preds)
@keunwoochoi
Copy link
Owner

Hi, first of all, I don’t know. I’ll guess a bit.

A direct/automatic application of full integer can be dangerous since the dynamic rance of melspectrogram magnitude (before decibel scaling) is extremely skewed. In other words, the distribution is exponential while full integer quantization would be (I think) rather linear.

@eppane
Copy link
Author

eppane commented Mar 18, 2022

Hi, first of all, I don’t know. I’ll guess a bit.

A direct/automatic application of full integer can be dangerous since the dynamic rance of melspectrogram magnitude (before decibel scaling) is extremely skewed. In other words, the distribution is exponential while full integer quantization would be (I think) rather linear.

Thank you for the tip!

I decided to proceed with separating the tflite-compatible mel-spec-block from the rest of the model. So when I am applying full-integer quantization, I am using the mel-spec-block at the representative_dataset() function as a data preprocessing step. This seems to be working well.

I noticed that the size of the tflite-compatible mel-spec-block actually increases quite a bit when converting from .hdf5 to .tflite. When I save the mel-spec-block as .hdf5, it is about 12 kilobytes, but when converting to .tflite, the size is about 84 kilobytes. Is this behaviour expected?

Is it possible with kapre to calculate spectrograms iteratively, row-by-row, collecting FFT results from slices of audio at a time? Instead of needing the whole audio for calculating the STFT. I think this could be an interesting feature as in TinyML-devices, the buffers can't hold much data at once, especially when sampling rates get higher.

@Path-A
Copy link
Contributor

Path-A commented Jun 6, 2022

@eppane Do you have a toy example of this? Are you performing dynamic range quantization on the melspec block and int8 quantization on the rest of the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants