Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/quantizer factory refactoring #1963

Draft
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

BloodAxe
Copy link
Contributor

@BloodAxe BloodAxe commented Apr 15, 2024

List of breaking changes:

  1. QATRecipeModificationCallback is not required anymore. modify_params_for_qat method is now a part of TensorRT Quantizer class
    Fallback mechanism:
    Class can be left "as is", with warning message indicating it is deprecated. Implementation is no-op (making no changes to config file)

  2. Quantization params config section now has different structure
    Fallback mechanism:
    Probably can detect whether new/old config is given and fallback to TRT quantizer in the latter case

  3. Trainer.ptq / Trainer.qat methods.
    We can make them work as before, but this would limit us to using only a TRT quantizer.


# TODO: Should this really be here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?
We can do PTQ/QAT on multiple GPUs (I specifically remember I wrote support for it in the past...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okie, removing todo then)

@@ -0,0 +1,34 @@
quantizer:
TRTPTQQuantizer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the logic of decoupling the exported from the quantizer...
I feel this is puts more risk for errors (like having the TRT quantizer with int8 and onnx exporter, which migt mislead users to try and run that onnx file on onnruntime(.
There is some benefit for debugging though, I guess...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think the exporter and quantizer should be part of the same object

output_path: ${architecture}_trt_ptq_int8.onnx

# Model-specific parameters
export_params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to properly document all the export params with their corresponding exporters.
I would feel a bit lost if I wanted lets say, try and build a TRT engine on a segmentation model for instance.

detection_max_predictions_per_image: 128
detection_num_pre_nms_predictions: 1000
detection_predictions_format: flat
detection_postprocessing_use_tensorrt_nms: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it pretty confusing that we have an Exporter object, but model specific export_params....
Especially given the fact that there's detection_postprocessing_use_tensorrt_nms argument here.

Copy link
Contributor Author

@BloodAxe BloodAxe Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nothing new btw, ExportParams are not introduced in this PR, it's just I've added explicitly all the fields that exists in ExportParams structure to help user start editing these parameters.
I can duplicate docstrings on each setting, but it's going quite long YAML (see ExportParams)

Copy link
Collaborator

@ofrimasad ofrimasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive.
Not sure I understand the logic.
Why do we convert to onnx and then apply the specific quantazier and exporter?

"""

@abc.abstractmethod
def export_from_onnx(self, source_onnx: str, output_file: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the logic of this.
We said that SG will not export the TRT engine, or OpenVino engine. We export an onnx file (even if that file is framework specific)


model = ptq(
quantized_model = tensorrt_ptq(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. This is only for TRT, what if I want to convert to something else?

@@ -0,0 +1,34 @@
quantizer:
TRTPTQQuantizer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think the exporter and quantizer should be part of the same object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants