[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running #46

MoFHeka · 2024-01-27T15:32:09Z

Einsum kernel couldn't' be lowered to cudnn GEMM. The computing performance is seriously affected.
Can you believe it? JAX(Flax or Praxis) attention layers are even slower than Tensorflow version (not Keras)!

MoFHeka mentioned this issue Jan 29, 2024

Add an experimental JAX scaleDotProductAttention API to use XLA cuDNN fusedAttention feature. jax-ml/jax#18814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running #46

[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running #46

MoFHeka commented Jan 27, 2024 •

edited

Loading

[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running #46

[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running #46

Comments

MoFHeka commented Jan 27, 2024 • edited Loading

MoFHeka commented Jan 27, 2024 •

edited

Loading