-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large LocalArray eltypes runs into compiler heuristics #99
Comments
Apparently I fixed that in JuliaLang/julia#46050, so 1.8 is supported. I've added it to CI. |
Let's re-open this to keep track of the max fragment size though. cc @wardvermeulen |
Updated MWE: using GemmKernels, CUDA
using GemmKernels: LocalArray
using Base: setindex
function kernel()
c_frags = LocalArray{Tuple{64}, Float32, 1, 64}(undef)
setindex(c_frags, 0f0, 1)
return
end
function main()
CUDA.code_llvm(kernel, Tuple{})
end
isinteractive() || main() The apply iterate can be avoided by setting the |
If you are referring to providing safeguards so this behavior does not occur, I did not account for this in the implementation. |
Ah OK, I thought there were some hard-coded limits that relate to the 16-element LocalArray limit. |
The following works:
But bumping the eltype to
Tuple{64}
results in anapply_iterate
.The text was updated successfully, but these errors were encountered: