You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.
Yes, that branch is very old. I made random fixes while debugging and only managed to bring it to a point where it can achieve a score of 25it/s. According to reports, using this commit of ROCm LLVM can reach 30it/s.
The submodule in this branch is linked to the specified branch of Composable Kernel, which has a Fused Attention implementation for Navi 3x.
I spent a lot of time trying to integrate this Fused Attention into PyTorch before. And you can find my efforts here:
i only ran aitemplate in navi3_rel_ver_1.0,it is so old
The text was updated successfully, but these errors were encountered: