incorrect causal mask in global attention #1135

davidqqq · 2024-10-22T12:51:23Z

I am using GlobalAttention and got abnormally low loss even when I set causal to True. Upon inspection, I found the causal mask is not applied at all.

Here the shape[1] will always be 1 given the assert?

xformers/xformers/components/attention/global_tokens.py

Line 73 in 68b7fd1

self.attention_mask &= causal_1d_pattern(attention_query_mask.shape[1])

After changing to shape[0] the loss is more reasonable.

This has been around for quite a while and no one seems to report any issue. Can anybody confirm?

sebhtml · 2024-11-12T13:25:10Z

My understanding is that attention_query_mask has a shape [N, 1].

You are right that causal_1d_pattern(attention_query_mask.shape[1]) will always be causal_1d_pattern(1) since the documentation says that the shape is [N, 1].

So I think it is indeed a bug.

davidqqq linked a pull request Nov 14, 2024 that will close this issue

fix: [N, 1] pattern should take N to create 1d causal mask #1149

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect causal mask in global attention #1135

incorrect causal mask in global attention #1135

davidqqq commented Oct 22, 2024 •

edited

Loading

sebhtml commented Nov 12, 2024

incorrect causal mask in global attention #1135

incorrect causal mask in global attention #1135

Comments

davidqqq commented Oct 22, 2024 • edited Loading

sebhtml commented Nov 12, 2024

davidqqq commented Oct 22, 2024 •

edited

Loading