-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to output NVPTX assembly/IR/bytecode? #6410
Comments
Still, this code is hard to interpret. I'd much rather see something along the lines of a statement file for GPU code (like we have stmt files for CPU code). |
Setting the environment variable HL_DEBUG_CODEGEN=1 causes Halide to print the PTX. If you set it to 2 and ptxas is in the path, it also attempts to print the SASS. |
Being able to emit ptx source in some way other than debug output would be better, and is on the TODO list: #5055 |
I'm currently testing with just putting the buffer that sits in the Module in the Stmt file if it ends with "_gpu_source_kernels". Not sure if it's super good idea, but at least it's in a collapsible button. |
Fixed by #6444 |
I'm looking to find a way to inspect the NVPTX generated code for my pipelines. The statement files only contain calls to
halide_cuda_run()
to launch kernels. I am looking for the code itself. I found so far that if you usec_source
as generator output, it will produce a very long C-file in which the code is hiding somewhere. However this fails in case my CPU scheduling part of the pipeline contains stuff that is not supported by the C-backend (like predicated load). I think that there is an output type missing for generators to just output the CUDA kernel assembly.An example of kernel code hiding in the C-source is this:
All kernels to one file without the C-stuff around it, is really missing right now.
If desirable, this is definitely stuff I could contribute on, so feel free to give me some pointers on how to approach this, and I'll make a PR.
The text was updated successfully, but these errors were encountered: