You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi thanks for your great work. I am the author of SageAttention. I wonder whether if it is suitable to include SageAttention into the repo. SageAttention is a quantized attention currently optmized for Ada architecture. It supports qk int8 and pv fp8. It will be easy to support qk fp8 too.
The text was updated successfully, but these errors were encountered:
@jason-huang03
🎉Hi jason-huang03~ thank you very much for your attention to CUDA-Learn-Notes. SageAttention is an excellent work and also a great learning resource. It would be fantastic if it could be integrated into CUDA-Learn-Notes. Please feel free to submit a PR. You can add it to the kernels/sage-attention directory, referring to hgemm as an example. If you can integrate sage-attention into a library, like toy-hgemm library, that would be even better. After the PR is merged, I will pin your work on the README homepage. Thank you very much~
Hi thanks for your great work. I am the author of SageAttention. I wonder whether if it is suitable to include SageAttention into the repo. SageAttention is a quantized attention currently optmized for Ada architecture. It supports qk int8 and pv fp8. It will be easy to support qk fp8 too.
The text was updated successfully, but these errors were encountered: