A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.
unity cuda gpgpu hlsl d3d12 compute-shaders chained-scan-with-decoupled-lookback inclusive-prefix-sum exclusive-prefix-sum
-
Updated
Dec 10, 2024 - C++