GPU autoscheduling with Mullapdui2016: the reference implementation · halide/Halide@8683250

Commit

GPU autoscheduling with Mullapdui2016: the reference implementation

Reverse engineer the GPU scheduling feature as stated in Section 5.4 of
Mullapudi's article:

Mullapudi, Adams, Sharlet, Ragan-Kelley, Fatahalian. Automatically
scheduling Halide image processing pipelines.
ACM Transactions on Graphics, 35(4), 83pp 1–11
https://doi.org/10.1145/2897824.2925952

When `target=cuda` is detected in the code generator command line
arguments, intercept all `vectorize`, `parallel` scheduling calls
requested by the auto-vectorization algorithm and the
auto-parallelization algo with the class `GPUTilingDedup` for deferred
execution.

Implement the class `GPUTilingDedup` to ensure all Halide gpu schedule
calls are idempotent: no matter how many times the Stage is vectorized,
reordered, and then repeated `vectorized, the `gpu_threads()` is called exactly once.

Also, intercept all `split` and `reorder` scheduling calls by
Mullapudi's auto-splitting algorithm.

Implement the clss `GPUTileHelper` to enforce atomic tranaction of the
gpu schedules. If the current stage is `compute_root`, mark all auto-split
inner dimensions as `gpu_threads`, and outer dimensions as `gpu_blocks`.
If the Stage is `compute_at` another Stage, mark all `vectorize`
dimensions as `gpu_threads`.

If auto-splitting of the current stage does not result in any tile,
implement a rudimentary tiling having tile size = vector_length x
parallel_factor.

If Mullapudi does not call any split, vectorize, or parallel schedules,
assume scalar reduction routine. Implement it on the GPU via
`single_thread`.

Loading branch information

antonysigma committed Aug 21, 2023

1 parent f11e80d commit 8683250

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8683250`

Commit

There are no files selected for viewing

0 comments on commit 8683250

0 comments on commit `8683250`