Add more unit tests to FA fwd kernels. #609

xinyazhang · 2024-06-28T22:30:00Z

Note it is not testing the backward kernel but use the kernel in ref_bwd*.py as reference.

Run the UT with pytest test_backward.py

To run a know set of parameters, change the main2 function and run python test_backward.py (pytest -k works but it's much slower and requires -s to enable standard output)

The whole test suite takes around 12 hours to complete (after disabling auto-tuning). The main problem is the Triton kernel compiling. With more tl.constexpr it make take even longer.

Add Perf Kernels This is a combination of 2 commits. Add Perf Kernels Add Perf Kernels This is a combination of 6 commits. add perf-kernels fix formating issues fix unused variables and other bugs fix other issues remove scripts save check changes format save save try pre-commit check save

Change all block pointers to tensor pointers Block pointers are for nvidia TMAs. They are useful for regular loads as well but not well supported. Also cleaned up some code I came across along the way and updated comment at the top.

Add support for layouts commonly used by users. Add option for varlen / thd layout to specify equal context lengths for all batches. Also often used by users.

Note it is not testing the backward kernel but use the kernel in _ref_bwd_*.py as reference.

xinyazhang · 2024-06-28T22:32:27Z

Known problem: for Triton commit 00e09cf3008b86978f25f838659698e4a0bf6f45. Running pytest test_backward.py -v -x shows the following runtime error.

self = <.HIPLauncher object at 0x79492bd4c220>, args = (1, 1, 1, 180074496, 185729744, (8, 1, 32768, 1, 1, 1), ...), kwargs = {}

    def __call__(self, *args, **kwargs):
        print(f'{args=}')
        print(f'{kwargs=}')
>       self.launch(*args, **kwargs)
E       RuntimeError: Triton Error [HIP]:  Code: 1, Messsage: invalid argument

../../../aotriton/third_party/triton/python/triton/backends/amd/driver.py:420: RuntimeError

Removing all autotune configs (except for 'BLOCK_M': 16, 'BLOCK_N': 16,) can mitigate this problem, but this is probably not what we want.

xinyazhang · 2024-07-02T18:40:50Z

Known problem: for Triton commit 00e09cf3008b86978f25f838659698e4a0bf6f45. Running pytest test_backward.py -v -x shows the following runtime error.

Confirmed this is caused by double loading of libamdhip64.so and can be fixed by a189c11

As a temporary solution, setting TRITON_LIBHIP_PATH to PyTorch's .so file can fix this as well

micmelesse and others added 5 commits June 19, 2024 08:21

skip backward (#586)

cc535d3

Change all block pointers to tensor pointers (#585)

cfb231f

Change all block pointers to tensor pointers Block pointers are for nvidia TMAs. They are useful for regular loads as well but not well supported. Also cleaned up some code I came across along the way and updated comment at the top.

Add support for bshd layout (#587)

18930eb

Add support for layouts commonly used by users. Add option for varlen / thd layout to specify equal context lengths for all batches. Also often used by users.

Add more unit tests to FA fwd kernels.

1e96cdd

Note it is not testing the backward kernel but use the kernel in _ref_bwd_*.py as reference.

micmelesse force-pushed the main_perf branch from d26ef1d to dbe1173 Compare July 16, 2024 23:38

micmelesse force-pushed the main_perf branch from 16b0bbf to 628e09b Compare October 28, 2024 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more unit tests to FA fwd kernels. #609

Add more unit tests to FA fwd kernels. #609

xinyazhang commented Jun 28, 2024

xinyazhang commented Jun 28, 2024

xinyazhang commented Jul 2, 2024

Add more unit tests to FA fwd kernels. #609

Are you sure you want to change the base?

Add more unit tests to FA fwd kernels. #609

Conversation

xinyazhang commented Jun 28, 2024

xinyazhang commented Jun 28, 2024

xinyazhang commented Jul 2, 2024