Enable QNN HTP spill fill buffer setting to save RAM usage. #22853

HectorSVC · 2024-11-15T17:19:32Z

Description

Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN context binary.
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

HectorSVC · 2024-11-15T21:25:58Z

@chiwwang, could you help to take a look?

chiwwang · 2024-11-18T04:22:31Z

Hi Hector,
This looks good for me but let me ping others and see if they can also take a look.

HectorSVC · 2024-11-22T06:32:02Z

Comments from QC: The approach has the limitation that it always gets the max spill fill buffer size form the 1st QNN context. The max spill file buffer size should be across all QNN contexts. To fill the gap, we need to go through all QNN context to:

Load the QNN context binary buffer and extract the max spill fille buffer size for each QNN context
Compare the max spill fille buffer size across all QNN context and track the index of the QNN context
Load and deserialize the QNN context (to get the graph info for future execute) which has the max spill fille buffer size first, also set the max spill fill buffer, set the group handle to 0.
Load and deserialize other QNN contexts, set the max spill buffer size, and set the group handle to the context in step3.

Considering this feature is mostly target for large models which has large context binary size, so there will be big overhead for step 1 & 2. Another approach is we dump the max spill fill buffer size for each Qnn context in EPContext node when we generate the model to make this information ready ahead of time instead of get it during normal session creation time. We can get the information from all EPContext nodes to get the max size and load that one first.

Enable QNN HTP spill fill buffer setting to save RAM usage.

e63f0ea

github-actions bot reviewed Nov 15, 2024

View reviewed changes

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc Outdated Show resolved Hide resolved

Update onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

6770ada

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

HectorSVC closed this Nov 15, 2024

HectorSVC reopened this Nov 15, 2024

fix format

176b2c9

HectorSVC added the ep:QNN issues related to QNN exeution provider label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable QNN HTP spill fill buffer setting to save RAM usage. #22853

Enable QNN HTP spill fill buffer setting to save RAM usage. #22853

HectorSVC commented Nov 15, 2024

github-actions bot left a comment

HectorSVC commented Nov 15, 2024

chiwwang commented Nov 18, 2024

HectorSVC commented Nov 22, 2024 •

edited

Loading

Enable QNN HTP spill fill buffer setting to save RAM usage. #22853

Are you sure you want to change the base?

Enable QNN HTP spill fill buffer setting to save RAM usage. #22853

Conversation

HectorSVC commented Nov 15, 2024

Description

github-actions bot left a comment

Choose a reason for hiding this comment

HectorSVC commented Nov 15, 2024

chiwwang commented Nov 18, 2024

HectorSVC commented Nov 22, 2024 • edited Loading

HectorSVC commented Nov 22, 2024 •

edited

Loading