-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using one single array of pointers for multiGPU AMDGPU computation #663
Comments
@maleadt how does CUDA make array ( |
You can, but at the moment it is not pretty: bytesize = prod(dims) * sizeof(T)
buf = AMDGPU.Runtime.Mem.HostBuffer(bytesize, AMDGPU.HIP.hipHostAllocPortable)
amdgpu_pointer_ret = ROCArray{T, N}(AMDGPU.DataRef(AMDGPU.pool_free, AMDGPU.Managed(buf)), dims)
# Copy from CPU array.
copyto!(amdgpu_pointer_ret, pointer_ret) But this is different from CUDA. What CUDA devices do you use? Maybe they have unified memory? |
Thank you @pxl-th !! So sorry @pxl-th , but I do not understand well your code. Can you use the variable names that I used in my code to help me understand where I have to make the modifications? I think that I must be using an old version of AMDGPU because I cannot find AMDGPU.pool_free and AMDGPU.Managed. |
Yes, you should use AMDGPU 1.0, it has important multi-GPU fixes. Here's the code, I don't have access to multi-gpu system at the moment, but at least on 1 GPU it works: using AMDGPU
"""
Create a ROCArray that is accessible from different GPUs (a.k.a. portable).
"""
function get_portable_rocarray(x::Array{T, N}) where {T, N}
dims = size(x)
bytesize = sizeof(T) * prod(dims)
buf = AMDGPU.Mem.HostBuffer(bytesize, AMDGPU.HIP.hipHostAllocPortable)
ROCArray{T, N}(AMDGPU.GPUArrays.DataRef(AMDGPU.pool_free, AMDGPU.Managed(buf)), dims)
end
function main()
ndev = 2
pointer_ret = Vector{AMDGPU.Device.ROCDeviceVector{Float64,AMDGPU.Device.AS.Global}}(undef, ndev)
# Fill `pointer_ret` with pointers here.
amdgpu_pointer_ret = get_portable_rocarray(pointer_ret)
@show amdgpu_pointer_ret
return
end |
Assuming the buffer type used here is device memory (which is the default), CUDA.jl enables P2P access between devices when doing the conversion of Note that this isn't guaranteed to always work; the devices need to be compatible, or P2P isn't supported. In that case the user is responsible for staging through the CPU (by explicit |
Thanks @pxl-th and @maleadt for your comments!!!
Do you know why?
|
Ah... That's a bug in AMDGPU.jl with setting features of the compilation target. I'll fix it |
Hi folks!
I am working on the multiGPU support of JACC: https://github.com/JuliaORNL/JACC.jl/
For that, I would need to be able to use a single array of pointers that can store pointers to different GPUs.
I opened another issue a few days ago: #662
Although that helped me understand the problem better, I still cannot run the test code below.
I can run that code on CUDA (I put the CUDA code too, just in case it is useful).
@pxl-th mentioned the CU_MEMHOSTALLOC_PORTABLE CUDA flag. Can we use that in AMDGPU?
Here are the codes:
AMDGPU
CUDA
The text was updated successfully, but these errors were encountered: