stack.hip: why is the kernel vector buffer spilling into global memory? #523

fxmarty-amd · 2024-12-19T16:30:52Z

Describe your question

Hi, I am reading https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/tutorial/profiling-by-example.html#spill-scratch-buffer,

#include "common.h"

__global__ void knl(int* out, int filter) {
  int x[1024];
  x[filter] = 0;
  if (threadIdx.x < filter) out[threadIdx.x] = x[threadIdx.x];
}

int main() {
  knl<<<1, 1>>>(nullptr, 0);
  hipCheck(hipDeviceSynchronize());
}

and am wondering why x would spill into global memory (the documentation reads: that cannot reasonably fit into registers):

the stack is backed by global memory

Using hipGetDeviceProperties on MI250, we see that regsPerBlock is 65536 registers (32-bits each). And 1024 < 65536, and we are using a single thread block, with a single thread. So why are we spilling? Reading rocprofiler-compute doc as well, VGPR seem to be in the 10s or 100s of KB, so I am surprised.

Is it because that since the warp size for Instinct is 64, we can't really schedule a single thread and we are scheduling in reality behind the scenes 64 threads, requiring 65536 32-bit registers? I guess this is not the case, as I guess we would have branching for the 63 other threads, and they would just sit idle no?

Thank you!

Additional context

No response

The text was updated successfully, but these errors were encountered:

fxmarty-amd added the question Further information is requested label Dec 19, 2024

ppanchad-amd added the Under Investigation label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stack.hip: why is the kernel vector buffer spilling into global memory? #523

stack.hip: why is the kernel vector buffer spilling into global memory? #523

fxmarty-amd commented Dec 19, 2024 •

edited

Loading

stack.hip: why is the kernel vector buffer spilling into global memory? #523

stack.hip: why is the kernel vector buffer spilling into global memory? #523

Comments

fxmarty-amd commented Dec 19, 2024 • edited Loading

Describe your question

Additional context

fxmarty-amd commented Dec 19, 2024 •

edited

Loading