-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(compression): implement tensor decompression in op fully_connected #3006
Conversation
Implement tensor decompression in op fully_connected. Extend tests to validate operation on compressed tensors. Fix the effect on memory_arena_threshold_test by setting a different expected value for the persistent buffer allocation when compression is configured in. The allocation was allowed to vary by 3%; however, compression adds ~10%. Set the expected value to the measured value when compression is configured in. BUG=part of tensorflow#2636
Hifi compression tests seem to fail for this |
@suleshahid @rkuester These tests pass in my local environment, but my environment is not identical to the compress-testing branch or this PR. I found two files with a merge error in the compress-testing branch: fully_connected_test.cc and transpose_conv_test.cc. They had a method templated that should not have a template. However, I don't see how that would cause the test to fail. I have updated the compress-testing branch with the correctly merged code. I diffed compress-testing against my own branch and found no other relevant differences. |
That still failed. I'll in the process of testing my ultimate destination in CI to see if it's due to something I've left out of this PR's commit. |
@rkuester The test script requires the optimized kernels, but the PR does not have compression support for the optimized kernels. |
I see. That's a surprising way to fail---build and runtime success, but wrong results. Is there an way we can transform this into a failure to build (an extension that should support compression having a new signature or something), or at least a better failure at runtime (an extension noticing the length of the tensor doesn't make sense, etc.)? |
@@ -115,9 +143,18 @@ TfLiteStatus FullyConnectedEval(TfLiteContext* context, TfLiteNode* node) { | |||
tflite::micro::GetTensorShape(input), | |||
tflite::micro::GetTensorData<float>(input), | |||
tflite::micro::GetTensorShape(filter), | |||
#ifdef USE_TFLM_COMPRESSION | |||
tflite::micro::GetTensorData<float>(micro_context, filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching this a bit late, but in the new GetTensorData function, we check and return nullptr if tensor is nullptr. However, in the existing code, its a TFLITE_DCHECK, which means we will fail here, since we expected data. I think its better to do it that way, because otherwise we could end up trying to use the nullptr somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid for cases where the bias tensor is optional as input to the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I would prefer if we have another override for the optional case. Since this will be basically changing the code even for non-compression execution (in the case globally compression is enabled), it will be safer to have the nullptr DCHECK.
If you think that will take too long, for now we could just add a single DCHECK for the weights data in the above ifdef compression code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing how many files use it, probably easiest to just add the override GetOptionalTensorData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in progress.
|
||
# override KERNEL_OPTIMIZATION_LEVEL to enable higher performance | ||
# Xtensa intrinsics. | ||
$(KERNEL_OBJDIR)$(TENSORFLOW_ROOT)tensorflow/lite/micro/kernels/xtensa/decompress.o: $(TENSORFLOW_ROOT)tensorflow/lite/micro/kernels/xtensa/decompress.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if it's not sufficient to put this in MICROLITE_CC_KERNEL_SRCS
? I understand that a dedicated rule was required when the decompression routine was in micro_context.cc
, but I'm not sure if it's needed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cadence requires us to use special compile options just for this one file. Using -O3 on all kernels does NOT work.
@ddavis-2015, I'll let you comment on these reviews. (Note, I'm mostly passing through the C++ code on David's behalf at this point.) |
Fixes for FULLY_CONNECTED optional bias tensor.
Implement tensor decompression in op fully_connected. Extend
tests to validate operation on compressed tensors.
Include Xtensa extensions for decompression, as they are required
by the tests.
Fix the effect on memory_arena_threshold_test by setting a
different expected value for the persistent buffer allocation
when compression is configured in. The allocation was allowed to
vary by 3%; however, compression adds ~10%. Set the expected
value to the measured value when compression is configured in.
BUG=part of #2636