Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect causal mask in global attention #1135

Open
davidqqq opened this issue Oct 22, 2024 · 1 comment · May be fixed by #1149
Open

incorrect causal mask in global attention #1135

davidqqq opened this issue Oct 22, 2024 · 1 comment · May be fixed by #1149

Comments

@davidqqq
Copy link

davidqqq commented Oct 22, 2024

I am using GlobalAttention and got abnormally low loss even when I set causal to True. Upon inspection, I found the causal mask is not applied at all.

Here the shape[1] will always be 1 given the assert?

self.attention_mask &= causal_1d_pattern(attention_query_mask.shape[1])

After changing to shape[0] the loss is more reasonable.

This has been around for quite a while and no one seems to report any issue. Can anybody confirm?

@sebhtml
Copy link

sebhtml commented Nov 12, 2024

My understanding is that attention_query_mask has a shape [N, 1].

You are right that causal_1d_pattern(attention_query_mask.shape[1]) will always be causal_1d_pattern(1) since the documentation says that the shape is [N, 1].

So I think it is indeed a bug.

@davidqqq davidqqq linked a pull request Nov 14, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants