Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack.hip: why is the kernel vector buffer spilling into global memory? #523

Open
fxmarty-amd opened this issue Dec 19, 2024 · 0 comments
Open
Labels
question Further information is requested Under Investigation

Comments

@fxmarty-amd
Copy link

fxmarty-amd commented Dec 19, 2024

Describe your question

Hi, I am reading https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/tutorial/profiling-by-example.html#spill-scratch-buffer,

#include "common.h"

__global__ void knl(int* out, int filter) {
  int x[1024];
  x[filter] = 0;
  if (threadIdx.x < filter) out[threadIdx.x] = x[threadIdx.x];
}

int main() {
  knl<<<1, 1>>>(nullptr, 0);
  hipCheck(hipDeviceSynchronize());
}

and am wondering why x would spill into global memory (the documentation reads: that cannot reasonably fit into registers):

the stack is backed by global memory

Using hipGetDeviceProperties on MI250, we see that regsPerBlock is 65536 registers (32-bits each). And 1024 < 65536, and we are using a single thread block, with a single thread. So why are we spilling? Reading rocprofiler-compute doc as well, VGPR seem to be in the 10s or 100s of KB, so I am surprised.

Is it because that since the warp size for Instinct is 64, we can't really schedule a single thread and we are scheduling in reality behind the scenes 64 threads, requiring 65536 32-bit registers? I guess this is not the case, as I guess we would have branching for the 63 other threads, and they would just sit idle no?

Thank you!

Additional context

No response

@fxmarty-amd fxmarty-amd added the question Further information is requested label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Under Investigation
Projects
None yet
Development

No branches or pull requests

2 participants