Include SageAttention Kernel #147

jason-huang03 · 2024-11-24T09:06:30Z

Hi thanks for your great work. I am the author of SageAttention. I wonder whether if it is suitable to include SageAttention into the repo. SageAttention is a quantized attention currently optmized for Ada architecture. It supports qk int8 and pv fp8. It will be easy to support qk fp8 too.

jason-huang03 · 2024-11-24T09:10:12Z

Also I myself have a well optimized W8A8 GEMM kernel that reaches > 500T on 4090. I wonder whether it is suitable to add this into the repo too.

DefTruth · 2024-11-24T09:27:16Z

@jason-huang03
🎉Hi jason-huang03~ thank you very much for your attention to CUDA-Learn-Notes. SageAttention is an excellent work and also a great learning resource. It would be fantastic if it could be integrated into CUDA-Learn-Notes. Please feel free to submit a PR. You can add it to the kernels/sage-attention directory, referring to hgemm as an example. If you can integrate sage-attention into a library, like toy-hgemm library, that would be even better. After the PR is merged, I will pin your work on the README homepage. Thank you very much~

jason-huang03 · 2024-11-24T13:49:35Z

Sounds Great! I will try to figure out how to do that.

DefTruth assigned DefTruth and jason-huang03 and unassigned DefTruth Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include SageAttention Kernel #147

Include SageAttention Kernel #147

jason-huang03 commented Nov 24, 2024

jason-huang03 commented Nov 24, 2024

DefTruth commented Nov 24, 2024

jason-huang03 commented Nov 24, 2024

Include SageAttention Kernel #147

Include SageAttention Kernel #147

Comments

jason-huang03 commented Nov 24, 2024

jason-huang03 commented Nov 24, 2024

DefTruth commented Nov 24, 2024

jason-huang03 commented Nov 24, 2024