forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tma_persistent impl for FP8 rowwise gemm (pytorch#2742)
Summary: Add tma persistent kernel impl for FP8 rowwise gemm with ` fp8_fast_accum=True` based on the Triton upstream implementation triton-lang/triton#4099. Pull Request resolved: pytorch#2742 Reviewed By: chenyang78, htyu Differential Revision: D58656793 Pulled By: sijiac fbshipit-source-id: 692091eb367cc2fd1ef821384bb5e49347f08929
- Loading branch information
1 parent
9caef86
commit cdad003
Showing
2 changed files
with
261 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters