Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for smem_epilogue when mma output is not cast to half #3620

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

protonu
Copy link
Collaborator

@protonu protonu commented Dec 19, 2024

Support non-stmatrix stores from regs to shared memory and then TMA when the output of mma op is not cast back to half precision - stmatrix works with half precision only.

@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

@protonu protonu marked this pull request as ready for review December 19, 2024 15:32
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
tests/cpp/test_matmul_scheduler.cpp Outdated Show resolved Hide resolved
@protonu protonu requested a review from jacobhinkle December 19, 2024 15:58
@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

@protonu protonu requested a review from jacobhinkle December 19, 2024 17:40
Copy link
Collaborator

@rdspring1 rdspring1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor Follow-Up Proposal:
It looks like scheduleEpilogue could be broken easily into two functions for readability instead of a single monolithic function.

void HopperMultipleMatmulScheduler::scheduleEpilogueWithVectorization() {}
void HopperMultipleMatmulScheduler::scheduleSmemEpilogue() {}

void HopperMultipleMatmulScheduler::scheduleEpilogue() {
  if (!params_->use_smem_epilogue) {
    scheduleEpilogueWithVectorization();
  } else {
    // Use stmatrix (optional) and tma store
    scheduleSmemEpilogue();
  }
}

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@protonu
Copy link
Collaborator Author

protonu commented Dec 20, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 20, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 21, 2024

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants