-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable QNN HTP spill fill buffer setting to save RAM usage. #22853
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@chiwwang, could you help to take a look? |
Hi Hector, |
Comments from QC: The approach has the limitation that it always gets the max spill fill buffer size form the 1st QNN context. The max spill file buffer size should be across all QNN contexts. To fill the gap, we need to go through all QNN context to:
Considering this feature is mostly target for large models which has large context binary size, so there will be big overhead for step 1 & 2. Another approach is we dump the max spill fill buffer size for each Qnn context in EPContext node when we generate the model to make this information ready ahead of time instead of get it during normal session creation time. We can get the information from all EPContext nodes to get the max size and load that one first. |
Description
Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN context binary.
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api