You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, for each frame, for each chunk, we invoke vkCmdUpdateBuffer with the transform from that chunk to the local node. In a valley, this can add up to hundreds of kilobytes. This is a bit of an abuse of vkCmdUpdateBuffer and may explain the large CPU time spent preparing to render chunks. There are a number of improvements to be made:
Use a staging mapped buffer and transfer command. This should mitigate driver overhead, and may improve performance substantially all on its own.
Because the underlying honeycomb is regular, we can drastically reduce the amount of bandwidth used by storing a precomputed table of transforms to the origin node from the chunks surrounding the origin node out to the maximum view distance, and maintaining a buffer of indices mapping the neighborhood of the player to analogous chunks surrounding the origin. This buffer is 1/32 the size of the current transform buffer, and would need to be rewritten every time the player moves between nodes, but small incremental writes could be used otherwise. This also saves us from doing a bunch of matrix multiplication as we traverse the graph, which might improve traversal performance significantly (currently 2-4ms/frame).
As of Smuggle voxel chunk ID through indirect buffer #53, chunk transform information (of whatever nature) can be passed through an instance buffer rather than looked up in a storage buffer, simplifying and perhaps slightly optimizing the vertex shader.
The text was updated successfully, but these errors were encountered:
The precomputed transform table could also potentially form a foundation for removing per-chunk draw calls, in favor of a multi-draw-indirect with compute frustum culling.
Currently, for each frame, for each chunk, we invoke
vkCmdUpdateBuffer
with the transform from that chunk to the local node. In a valley, this can add up to hundreds of kilobytes. This is a bit of an abuse ofvkCmdUpdateBuffer
and may explain the large CPU time spent preparing to render chunks. There are a number of improvements to be made:stagingmapped bufferand transfer command. This should mitigate driver overhead, and may improve performance substantially all on its own.The text was updated successfully, but these errors were encountered: