Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid block stride loop on AMD GPUs to increase performance for FEM kernels #495

Open
artv3 opened this issue Nov 25, 2024 · 2 comments
Open
Assignees

Comments

@artv3
Copy link
Member

artv3 commented Nov 25, 2024

It has been observed that performing block stride loops on AMD decreases performance, to increase performance use a direct mapping. Please see FEM kernels under apps.

@artv3 artv3 self-assigned this Nov 25, 2024
@MrBurmark
Copy link
Member

This is specifically for block stride loops and not grid stride loops?

@artv3
Copy link
Member Author

artv3 commented Nov 25, 2024

The kernel that prompted this had block stride loops, but yes I think we have seen lower performance with grid stride loops as well in other contexts as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants