Support flash-attention in forms of custom-call #17

ApsarasX · 2024-04-12T03:41:50Z

Usage:

import torch_xla.core.functions as xf

xf.flash_attn(
    query,
    key,
    value,
    *,
    dropout_rate=0.0,
    scale=None,
    is_causal=False,
    alibi_slopes=None,
    deterministic=False,
    return_softmax=False,
)

xf.flash_attn_varlen(
    query,
    key,
    value,
    cu_seqlens_query,
    cu_seqlens_key,
    *,
    max_seqlen_q,
    max_seqlen_k,
    dropout_rate=0.0,
    scale=None,
    is_causal=False,
    alibi_slopes=None,
    deterministic=False,
    return_softmax=False,
)

test/test_flash_attn.py

ApsarasX added 2 commits April 10, 2024 14:46

Support flash-attention

549b6de

Add test case

2a38cdd

ApsarasX force-pushed the wengang/flash-attention-rebase branch from c8f076e to b89507d Compare April 15, 2024 06:19

zjjott reviewed Apr 15, 2024

View reviewed changes

test/test_flash_attn.py Show resolved Hide resolved

test/test_flash_attn.py Show resolved Hide resolved

ApsarasX added 2 commits April 17, 2024 10:34

Reorder some fields and make flash-attention support legacy custom call

caaba0d

fix: add missing s_dmask's gradient option

52f1d1f

ApsarasX force-pushed the wengang/flash-attention-rebase branch from c2e19ce to 52f1d1f Compare April 17, 2024 02:35

ApsarasX merged commit 08306b7 into master Apr 17, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support flash-attention in forms of custom-call #17

Support flash-attention in forms of custom-call #17

ApsarasX commented Apr 12, 2024 •

edited

Loading

Support flash-attention in forms of custom-call #17

Support flash-attention in forms of custom-call #17

Conversation

ApsarasX commented Apr 12, 2024 • edited Loading

ApsarasX commented Apr 12, 2024 •

edited

Loading