-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hipblasdgemm not getting close to peak #1705
Comments
I've tried larger sizes and at some point the code just breaks without ever breaking the 40 TFLOP barrier |
Hi @JorgeG94, thanks for opening this issue. hipBLAS is just a wrapper library for rocBLAS/cuBLAS backends. rocBLAS then uses the Tensile library for calls to gemm. Since you're looking for better performance in dgemm, I think it will be best if I transfer this issue to the Tensile library where they can hopefully help you out. Performance tuning done there will be realized in rocBLAS and hipBLAS w/ AMD backend. Thanks, |
I will check this on my side. |
@JorgeG94 Can you please test with the latest ROCm 6.1.2? If your issue is resolved, please close the ticket. Thanks! |
What is the expected behavior
What actually happens
How to reproduce
hipcc -L/opt/rocm-5.4.3/lib -lhipblas --offload-arch=gfx90a performance.cpp
./a.out 36000 14400 36000 10 T T
Environment
The text was updated successfully, but these errors were encountered: