-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed #8318
Comments
To help you figure out what could be going wrong, there are three options:
|
Run with both
|
conceptual.stmt:
|
I don't see anything obviously wrong. Is it weird that the input device pointer (0x8a73800) is so much smaller than the output device pointer (0x72eec2200400)? btw zero-dimensional Funcs/Buffers are a thing in Halide. It would generate simpler code in these cases. |
@abadams I'm mostly concerned with the whole "produce input" block. The generator shouldn't produce the input, but only produce the output, and consume the input? It sort of looks like it's making a global-wrapper for the input It is funny to see how the autoscheduler tiled with 32 thread size, given there was only one. Bad scheduling, but not wrong. |
I think that's just a bad schedule. Input buffers come with wrapper Funcs (ending in "_im") that are normally just inlined. It's to support ImageParam::in(). Looks like this one has been compute_root'd. |
I can try the 0D version, what is the syntax for that in Python? ( I think there might be something wrong with the handling of 1-element tensors on CUDA. I'm seeing a similar error for every test using |
I think the syntax is a[()] (i.e. index it with an empty tuple of Vars) We have at a number of tests that output a scalar from cuda, so I don't think it's just straight-up broken. Halide's bounds inference logic is very well tested, and this is a very simple case, so it's not going to be that. There could be a bug in the runtime's handling of device allocations, but I don't think you're using that - you're just wrapping existing device pointers. 99% of the time when there's an illegal address exception it's because the input or output buffer is malformed. That's why I'm suspicious of your input device pointer. All the cuMalloc results I see are 12 hex digits and start with 7. |
Where do you see that? I read this in his output:
|
Ah, what that first number that is printed decimally instead of hexadecimally. Why isn't this printed clearer? Any reason or is this open for change? |
The device pointer comes first. 145176576 is 0x8a73800 It's an opaque 64-bit handle represented as a uint64 so we print it as a uint64. We could print it as if it's a void *, but that's not trivial because it involves printing a 64-bit "pointer" on 32-bit platforms. The code is here: https://github.com/halide/Halide/blob/main/src/runtime/to_string.cpp#L305 Maybe it should be refactored to have a halide_uint64_to_hex_string, and the pointer-printing method can defer to that. |
I think the I manually allocated a buffer and now the pointers look normal. It still fails, however the error message changed:
I'm not sure why Halide is trying to call |
The schedule says:
Not sure what tmp3 is, but that schedule says it is to be computed on CPU. To do that presumably the input needs to be copied to CPU, but the host pointer is null. |
I think the previous issue, where the device pointer was 32-bit, might hint that it was in a different memory space than global memory like Halide expected. Halide emitted ld.global.nc.f32 to load it, but maybe it's actually in constant memory, so that's not the right instruction? |
This 1-element (scalar) kernel works on CPU, but gives a
Error: CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed
on CUDA using both Li2018 and Anderson2021 autoschedulers.The text was updated successfully, but these errors were encountered: