-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update frontend and mk_ck_lib #777
Conversation
@chenyang78 @ipiszy review this one too. |
|
||
def forward(self, *args): | ||
def forward(self, *args, seqlens=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does seqlen represent? Where is seqlens used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in rocm attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the API after this change is not very readable. This API is supposed to be backend-agnostic, and is shared between CUDA and ROCM. Can we avoid this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rocm attention backend requires seqlen. Do you have any plan to add an API supports attention mask?
@@ -395,7 +419,8 @@ def forward(self, *args): | |||
else: | |||
x = self.proj(attn_output) | |||
x = self.proj_drop(x) | |||
x = ops.reshape()(x, [batch, -1, self.dim]) | |||
if not isinstance(batch, IntVar): | |||
x = ops.reshape()(x, [batch, -1, self.dim]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird. Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROCM attention support [batch * seqlen, nheads] tensor as inputs and output [batch * seqlen, nheads] tensor. The branch keeps original behavior.
I have reverted the attention |
@ipiszy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@ipiszy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
No description provided.