You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding is that attention_query_mask has a shape [N, 1].
You are right that causal_1d_pattern(attention_query_mask.shape[1]) will always be causal_1d_pattern(1) since the documentation says that the shape is [N, 1].
I am using GlobalAttention and got abnormally low loss even when I set causal to True. Upon inspection, I found the causal mask is not applied at all.
Here the shape[1] will always be 1 given the assert?
xformers/xformers/components/attention/global_tokens.py
Line 73 in 68b7fd1
After changing to shape[0] the loss is more reasonable.
This has been around for quite a while and no one seems to report any issue. Can anybody confirm?
The text was updated successfully, but these errors were encountered: