Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ROCSparse for Julia v1.10 #613

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amontoison
Copy link
Member

@pxl-th
Copy link
Collaborator

pxl-th commented Apr 2, 2024

FYI, I have disabled tests for rocSPARSE temporarily since they were crashing my Navi 3 in CI and I didn't have the time to investigate the final cause.
Also for some reason rocBLAS tests segfault on ROCm 5.6 (@luraess).

@amontoison have you run the tests locally? We can of course re-enable rocSPARSE tests, but I'm not sure they will run successfully

@amontoison
Copy link
Member Author

@pxl-th The tests passed on our cluster.

@pxl-th
Copy link
Collaborator

pxl-th commented Apr 7, 2024

@amontoison, I've sent you an invite to be able to merge PRs.
I currently don't have access to AMD GPUs and therefore not working on AMDGPU.jl.
So feel free to merge PRs once they are in a good state (although I'd recommend to merge them if CI is green).

@luraess
Copy link
Collaborator

luraess commented Apr 7, 2024

I can try running the tests on my system @pxl-th (now with ROCm 6.0.2 on Navi 3). On which system did they pass @amontoison ?

@amontoison
Copy link
Member Author

I can try running the tests on my system @pxl-th (now with ROCm 6.0.2 on Navi 3). On which system did they pass @amontoison ?

It was on Frontier. I need to check with @michel2323 the version of ROCm.

@michel2323
Copy link
Collaborator

ROCm 6.0 it was on an MI250.

@luraess
Copy link
Collaborator

luraess commented Apr 8, 2024

Running the ROCSparse tests on Navi 3 (gfx1101 - Radeon RX 7800 XT) and ROCm 6.0.2 I am getting the following test that error (alongside with an error in ROCBlas) test_log_out.txt.

@amontoison
Copy link
Member Author

@luraess Can you check if the tests for rocSPARSE are failing or not.on the branch master?

Can you also give more details about the errors.
I suspect that something is not correctly dispatched because all the units tests for mv! and mm! passed.

@luraess
Copy link
Collaborator

luraess commented Apr 9, 2024

Running only the rocSparse tests on master I am getting some warnings but no errors. There is still the failing BLAS test.
rocSaprse_out.txt

@amontoison
Copy link
Member Author

@luraess Can you just run include("test/rocarray/blas.jl")?

@luraess
Copy link
Collaborator

luraess commented Apr 14, 2024

@luraess Can you just run include("test/rocarray/blas.jl")?

Yes, here is the output of running the test test_out.txt

@amontoison
Copy link
Member Author

Thanks @luraess! But you need to import additional packages to isolate the issue:

using AMDGPU
using LinearAlgebra

import GPUArrays
include(joinpath(pkgdir(GPUArrays), "test", "testsuite.jl"))

testf(f, xs...; kwargs...) =
    TestSuite.compare(f, AMDGPU.ROCArray, xs...; kwargs...)

include("test/rocarray/blas.jl")

@luraess
Copy link
Collaborator

luraess commented Apr 14, 2024

Thanks for the hints. Following those I am getting a segfault on Navi3 - ROCm 6.0.2 (blas_navi3.txt) and a bunch of errors on MI250x - ROCm 5.3.3 on LUMI (blas_lumi.txt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants