You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clspv will explode the Spirv instructions to load the first float in 4 byte pieces, combine the pieces with binary arithmetic, add 1, and then split the result into 4 byte pieces again and store the pieces with 4 more instructions:
Why did Clspv do that? It is because of an old restriction in Vulkan.
4 years ago, before Vulkan 1.2 and Physical Addressing were available, a buffer could only have a single typed pointer. So if you have to access a byte and a float from memory, Clspv has to choose the smallest one, the byte, and fragment every other access to that minimum size. A bool is stored as one byte, so any other access to global memory (which is what most kernels use), will load and store floats in 4 pieces, an i64 in 8 pieces, and so on...
This is a rather bad situation than it seems at first! Because recent benchmarks here #1292, have revealed that this access fragmentation can cause upto a 30% penalty in performance. We also do not have any confirmation if driver compilers reverse this kind of fragmentation and if so, completely. The measured performance penalties suggest otherwise.
Now we have Physical Addressing, Clspv is free to create multiple pointer types, in this case, one needed for accessing floats, and another needed for accessing bools. Clspv and Spriv can switch pointer types and give them any physical address as needed. The loads and stores can be done sanely, and without any fragmentation.
However, Clspv has decided not to implement this modern feature so far.
The text was updated successfully, but these errors were encountered:
From the following simple code:
Clspv will produce compact and efficient Spirv. There is one Load, one Add, and one Store.
https://godbolt.org/z/WGxaa646x
However, with the following variation;
Clspv will explode the Spirv instructions to load the first float in 4 byte pieces, combine the pieces with binary arithmetic, add 1, and then split the result into 4 byte pieces again and store the pieces with 4 more instructions:
https://godbolt.org/z/j3qoP46j9
Why did Clspv do that? It is because of an old restriction in Vulkan.
4 years ago, before Vulkan 1.2 and Physical Addressing were available, a buffer could only have a single typed pointer. So if you have to access a byte and a float from memory, Clspv has to choose the smallest one, the byte, and fragment every other access to that minimum size. A bool is stored as one byte, so any other access to global memory (which is what most kernels use), will load and store floats in 4 pieces, an i64 in 8 pieces, and so on...
This is a rather bad situation than it seems at first! Because recent benchmarks here #1292, have revealed that this access fragmentation can cause upto a 30% penalty in performance. We also do not have any confirmation if driver compilers reverse this kind of fragmentation and if so, completely. The measured performance penalties suggest otherwise.
Now we have Physical Addressing, Clspv is free to create multiple pointer types, in this case, one needed for accessing floats, and another needed for accessing bools. Clspv and Spriv can switch pointer types and give them any physical address as needed. The loads and stores can be done sanely, and without any fragmentation.
However, Clspv has decided not to implement this modern feature so far.
The text was updated successfully, but these errors were encountered: