Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compression): implement tensor decompression in op fully_connected #3006

Merged
merged 5 commits into from
Dec 13, 2024

Conversation

rkuester
Copy link
Contributor

@rkuester rkuester commented Dec 11, 2024

Implement tensor decompression in op fully_connected. Extend
tests to validate operation on compressed tensors.

Include Xtensa extensions for decompression, as they are required
by the tests.

Fix the effect on memory_arena_threshold_test by setting a
different expected value for the persistent buffer allocation
when compression is configured in. The allocation was allowed to
vary by 3%; however, compression adds ~10%. Set the expected
value to the measured value when compression is configured in.

BUG=part of #2636

Implement tensor decompression in op fully_connected. Extend
tests to validate operation on compressed tensors.

Fix the effect on memory_arena_threshold_test by setting a
different expected value for the persistent buffer allocation
when compression is configured in. The allocation was allowed to
vary by 3%; however, compression adds ~10%. Set the expected
value to the measured value when compression is configured in.

BUG=part of tensorflow#2636
@rkuester rkuester requested a review from a team as a code owner December 11, 2024 00:36
@rkuester rkuester requested a review from suleshahid December 11, 2024 00:36
@suleshahid
Copy link
Collaborator

Hifi compression tests seem to fail for this

@ddavis-2015
Copy link
Member

Hifi compression tests seem to fail for this

@suleshahid @rkuester These tests pass in my local environment, but my environment is not identical to the compress-testing branch or this PR. I found two files with a merge error in the compress-testing branch: fully_connected_test.cc and transpose_conv_test.cc. They had a method templated that should not have a template. However, I don't see how that would cause the test to fail.

I have updated the compress-testing branch with the correctly merged code. I diffed compress-testing against my own branch and found no other relevant differences.

@rkuester
Copy link
Contributor Author

I have updated the compress-testing branch with the correctly merged code. I diffed compress-testing against my own branch and found no other relevant differences.

That still failed. I'll in the process of testing my ultimate destination in CI to see if it's due to something I've left out of this PR's commit.

@ddavis-2015
Copy link
Member

I have updated the compress-testing branch with the correctly merged code. I diffed compress-testing against my own branch and found no other relevant differences.

That still failed. I'll in the process of testing my ultimate destination in CI to see if it's due to something I've left out of this PR's commit.

@rkuester The test script requires the optimized kernels, but the PR does not have compression support for the optimized kernels.

@rkuester
Copy link
Contributor Author

The test script requires the optimized kernels, but the PR does not have compression support for the optimized kernels.

I see. That's a surprising way to fail---build and runtime success, but wrong results. Is there an way we can transform this into a failure to build (an extension that should support compression having a new signature or something), or at least a better failure at runtime (an extension noticing the length of the tensor doesn't make sense, etc.)?

@@ -115,9 +143,18 @@ TfLiteStatus FullyConnectedEval(TfLiteContext* context, TfLiteNode* node) {
tflite::micro::GetTensorShape(input),
tflite::micro::GetTensorData<float>(input),
tflite::micro::GetTensorShape(filter),
#ifdef USE_TFLM_COMPRESSION
tflite::micro::GetTensorData<float>(micro_context, filter,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching this a bit late, but in the new GetTensorData function, we check and return nullptr if tensor is nullptr. However, in the existing code, its a TFLITE_DCHECK, which means we will fail here, since we expected data. I think its better to do it that way, because otherwise we could end up trying to use the nullptr somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid for cases where the bias tensor is optional as input to the operator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would prefer if we have another override for the optional case. Since this will be basically changing the code even for non-compression execution (in the case globally compression is enabled), it will be safer to have the nullptr DCHECK.

If you think that will take too long, for now we could just add a single DCHECK for the weights data in the above ifdef compression code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how many files use it, probably easiest to just add the override GetOptionalTensorData.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in progress.


# override KERNEL_OPTIMIZATION_LEVEL to enable higher performance
# Xtensa intrinsics.
$(KERNEL_OBJDIR)$(TENSORFLOW_ROOT)tensorflow/lite/micro/kernels/xtensa/decompress.o: $(TENSORFLOW_ROOT)tensorflow/lite/micro/kernels/xtensa/decompress.cc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if it's not sufficient to put this in MICROLITE_CC_KERNEL_SRCS? I understand that a dedicated rule was required when the decompression routine was in micro_context.cc, but I'm not sure if it's needed now.

Copy link
Member

@ddavis-2015 ddavis-2015 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cadence requires us to use special compile options just for this one file. Using -O3 on all kernels does NOT work.

@rkuester
Copy link
Contributor Author

@ddavis-2015, I'll let you comment on these reviews. (Note, I'm mostly passing through the C++ code on David's behalf at this point.)

Fixes for FULLY_CONNECTED optional bias tensor.
@mergify mergify bot merged commit a9c6e6a into tensorflow:main Dec 13, 2024
54 of 55 checks passed
@rkuester rkuester deleted the feat-compression branch December 13, 2024 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants