You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current validator refuses passing non-isbitstype arguments, with the exception of arguments whose type passes the Core.Compiler.isconstType test. This makes it possible to, e.g., broadcast types as these arguments are only used to specialize the kernel, and not actually used by the generated code (even though they are passed, as opposed to ghost/singleton values).
In JuliaGPU/CUDA.jl#2514, it was noted that some code (notably closure-heavy code generated by Zygote) still refuses to compile, even though the generated code doesn't actually use the non-isbits value. For example:
struct Bar{T}
a::Tendfunctionmain()
a =cu(zeros(5))
capture = Bar
functionclosure(arg)
capture(arg)
endfunctionkernel(f, x)
f(x[])
returnend@cudakernel(closure, a)
end
The problem here is that the closure captures the type, making the closure non-isbits too. But because the closure is not a const type, we fail compilation. Even though the generated code is perfectly fine:
Note how the closure argument does really contain a managed pointer. In this case, we can work around the issue by reviving the more lenient validation removed in #24 where we not only checked for Core.Compiler.isconstType, but also if the value is actually used:
diff --git a/src/driver.jl b/src/driver.jl
index 9e05eb6..a4cff8f 100644
--- a/src/driver.jl+++ b/src/driver.jl@@ -88,8 +88,7 @@ function codegen(output::Symbol, @nospecialize(job::CompilerJob); toplevel::Bool
end
@timeit_debug to "Validation" begin
- check_method(job) # not optional- validate && check_invocation(job)+ check_method(job)
end
prepare_job!(job)
@@ -99,6 +98,10 @@ function codegen(output::Symbol, @nospecialize(job::CompilerJob); toplevel::Bool
ir, ir_meta = emit_llvm(job; libraries, toplevel, optimize, cleanup, only_entry, validate)
+ validate && @timeit_debug to "Validation" begin+ check_invocation(job, ir_meta.entry)+ end+
if output == :llvm
if strip
@timeit_debug to "strip debug info" strip_debuginfo!(ir)
diff --git a/src/validation.jl b/src/validation.jl
index e1a355b..9f1f869 100644
--- a/src/validation.jl+++ b/src/validation.jl@@ -66,7 +66,7 @@ function explain_nonisbits(@nospecialize(dt), depth=1; maxdepth=10)
return msg
end
-function check_invocation(@nospecialize(job::CompilerJob))+function check_invocation(@nospecialize(job::CompilerJob), entry::LLVM.Function)
sig = job.source.specTypes
ft = sig.parameters[1]
tt = Tuple{sig.parameters[2:end]...}
@@ -77,6 +77,9 @@ function check_invocation(@nospecialize(job::CompilerJob))
real_arg_i = 0
for (arg_i,dt) in enumerate(sig.parameters)
+ println(Core.stdout, arg_i)+ println(Core.stdout, dt)+
isghosttype(dt) && continue
Core.Compiler.isconstType(dt) && continue
real_arg_i += 1
@@ -89,9 +92,13 @@ function check_invocation(@nospecialize(job::CompilerJob))
end
if !isbitstype(dt)
- throw(KernelError(job, "passing and using non-bitstype argument",- """Argument $arg_i to your kernel function is of type $dt, which is not isbits:- $(explain_nonisbits(dt))"""))+ param = parameters(entry)[real_arg_i]+ if !isempty(uses(param))+ println(Core.stdout, string(entry))+ throw(KernelError(job, "passing and using non-bitstype argument",+ """Argument $arg_i to your kernel function is of type $dt, which is not isbits:+ $(explain_nonisbits(dt))"""))+ end
end
end
Sadly, this approach is insufficient for more complex cases such as:
struct Bar{T}
a::T
b::Tendfunctionmain2()
foo(f) = (args...) ->f(args...)
a =cu(zeros(5)); b =cu(ones(5)); c = Bar{Float32}; d =foo(c)
foo(c).(a, b)
end
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, var"#3#5"{Type{Bar{Float32}}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, which is not isbits:
.f is of type var"#3#5"{Type{Bar{Float32}}} which is not isbits.
.f is of type Type{Bar{Float32}} which is not isbits.
Note how the non-isbits Broadcasted argument is used, so it also fails the more lenient validation check, but it's just not the managed pointer that's being used.
I'm not sure how to proceed this. Simply removing the validation and trusting that other aspects of IR validation will error seems too optimistic -- IIRC we introduced this check to prevent accidentally reading CPU memory from the GPU. And actually detecting whether the managed pointer field is the one being used seems hard.
I'm also not sure how important that is; we've not received many bug reports about this, and the motivating example by @BioTurboNick would simply fail after validation anyway because it involves a broken broadcast (producing Any values). So maybe this isn't very important.
The text was updated successfully, but these errors were encountered:
Our current validator refuses passing non-isbitstype arguments, with the exception of arguments whose type passes the
Core.Compiler.isconstType
test. This makes it possible to, e.g., broadcast types as these arguments are only used to specialize the kernel, and not actually used by the generated code (even though they are passed, as opposed to ghost/singleton values).In JuliaGPU/CUDA.jl#2514, it was noted that some code (notably closure-heavy code generated by Zygote) still refuses to compile, even though the generated code doesn't actually use the non-isbits value. For example:
The problem here is that the closure captures the type, making the closure non-isbits too. But because the closure is not a const type, we fail compilation. Even though the generated code is perfectly fine:
Note how the closure argument does really contain a managed pointer. In this case, we can work around the issue by reviving the more lenient validation removed in #24 where we not only checked for
Core.Compiler.isconstType
, but also if the value is actually used:Sadly, this approach is insufficient for more complex cases such as:
Note how the non-isbits Broadcasted argument is used, so it also fails the more lenient validation check, but it's just not the managed pointer that's being used.
I'm not sure how to proceed this. Simply removing the validation and trusting that other aspects of IR validation will error seems too optimistic -- IIRC we introduced this check to prevent accidentally reading CPU memory from the GPU. And actually detecting whether the managed pointer field is the one being used seems hard.
I'm also not sure how important that is; we've not received many bug reports about this, and the motivating example by @BioTurboNick would simply fail after validation anyway because it involves a broken broadcast (producing
Any
values). So maybe this isn't very important.The text was updated successfully, but these errors were encountered: