Enable flash attention #20448

divyashreepathihalli · 2024-11-04T23:40:49Z

This PR

refactors the MHA layer so that its _compute_attention method would just call ops.dot_production_attention
Adds a global toggle keras.config.enable_flash_attention and keras.config.is_flash_attention_enabled

codecov-commenter · 2024-11-04T23:47:21Z

Codecov Report

Attention: Patch coverage is 72.72727% with 6 lines in your changes missing coverage. Please review.

Project coverage is 76.84%. Comparing base (c052cea) to head (3a47c53).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/layers/attention/multi_head_attention.py	63.63%	3 Missing and 1 partial ⚠️
keras/api/_tf_keras/keras/config/__init__.py	0.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c052cea) and HEAD (3a47c53). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (c052cea) HEAD (3a47c53)

keras 4 3

keras-jax 1 0

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20448      +/-   ##
==========================================
- Coverage   82.01%   76.84%   -5.18%     
==========================================
  Files         514      514              
  Lines       47194    47261      +67     
  Branches     7408     7417       +9     
==========================================
- Hits        38706    36317    -2389     
- Misses       6698     9220    +2522     
+ Partials     1790     1724      -66

Flag	Coverage Δ
keras	`76.72% <72.72%> (-5.15%)`	⬇️
keras-jax	`?`
keras-numpy	`59.86% <72.72%> (-0.01%)`	⬇️
keras-tensorflow	`65.90% <72.72%> (-0.02%)`	⬇️
keras-torch	`64.85% <72.72%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet

Thanks for the PR!

keras/src/backend/config.py

fchollet · 2024-11-05T00:11:07Z

keras/src/backend/config.py

+    use flash attention for faster computations.
+    """
+    global _ENABLE_FLASH_ATTENTION
+    _ENABLE_FLASH_ATTENTION = value


This is needs to be threadlocal. Instead of doing it like this, use set_global_attribute/get_global_attribute from keras.src.backend.common.global_state. See how other global flags are implemented.

importing global_state in config created circular import error so I have moved the configs to attention.py

keras/src/layers/attention/multi_head_attention.py

fchollet · 2024-11-05T00:13:45Z

keras/src/layers/attention/multi_head_attention_test.py



 class MultiHeadAttentionTest(testing.TestCase):
    def test_basics(self):
+        config.enable_flash_attention(True)


Add a numerical correctness test with and without FA.

Enable flash attention

37e2302

google-ml-butler bot added the size:M label Nov 4, 2024

google-ml-butler bot assigned gbaned Nov 4, 2024

divyashreepathihalli requested a review from fchollet November 4, 2024 23:41

google-ml-butler bot added the awaiting review label Nov 4, 2024

code reformat

057aa66

fchollet reviewed Nov 5, 2024

View reviewed changes

divyashreepathihalli added 3 commits November 5, 2024 01:14

address review comments

1abc948

add docstring

5784361

update docstring

3a47c53

divyashreepathihalli marked this pull request as draft November 5, 2024 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable flash attention #20448

Enable flash attention #20448

divyashreepathihalli commented Nov 4, 2024 •

edited

Loading

codecov-commenter commented Nov 4, 2024 •

edited

Loading

fchollet left a comment

fchollet Nov 5, 2024

divyashreepathihalli Nov 5, 2024 •

edited

Loading

fchollet Nov 5, 2024 •

edited

Loading

Enable flash attention #20448

Are you sure you want to change the base?

Enable flash attention #20448

Conversation

divyashreepathihalli commented Nov 4, 2024 • edited Loading

codecov-commenter commented Nov 4, 2024 • edited Loading

Codecov Report

fchollet left a comment

Choose a reason for hiding this comment

fchollet Nov 5, 2024

Choose a reason for hiding this comment

divyashreepathihalli Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

fchollet Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli commented Nov 4, 2024 •

edited

Loading

codecov-commenter commented Nov 4, 2024 •

edited

Loading

divyashreepathihalli Nov 5, 2024 •

edited

Loading

fchollet Nov 5, 2024 •

edited

Loading