building examples with CUDA on RTX 4070 #280

nibblelab · 2024-11-21T13:43:08Z

Hi,

I'm trying to compile the lib examples with CUDA 12 over an RTX 4070 GPU, but I'm having this error:

[ 36%] Building NVCC (Device) object examples/CMakeFiles/solver_cuda.dir/solver_cuda_generated_solver.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_30'
CMake Error at solver_cuda_generated_solver.cu.o.RelWithDebInfo.cmake:220 (message):
  Error generating
  /home/johnatas/MFLab/Code/amgcl/build/examples/CMakeFiles/solver_cuda.dir//./solver_cuda_generated_solver.cu.o

The GPU supported architectures are:

$ nvcc --list-gpu-arch
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87
compute_89
compute_90

If I change the CMakeLists.txt to choose the architecture from the GPU itself by modifying

cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS ${CUDA_TARGET_ARCH})

to

cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS Auto)

It will select a supported architecture, and compile the solver_cuda module (with some warnings) but the overall compiling will fail thanks to a series of compilation errors like:

[ 37%] Linking CXX executable solver_cuda
[ 37%] Built target solver_cuda
[ 38%] Building NVCC (Device) object examples/CMakeFiles/schur_pressure_correction_cuda.dir/schur_pressure_correction_cuda_generated_schur_pressure_correction.cu.o
...
avx512fp16intrin.h(101): error: more than one conversion function from "__half" to "<error-type>" applies
...
avx512fp16intrin.h(3187): error: return value type does not match the function type
...

How can I fix this to compile amgcl examples with CUDA 12?

My system settings:

Ubuntu 24.04 LTS
GCC 12.2
CUDA 12.0 - RTX 4070
Cmake 3.27.1
Boost 1.81.0
OpenMPI 4.1.5
Eigen3 3.4.0
Hwloc 2.9.0

The text was updated successfully, but these errors were encountered:

ddemidov · 2024-11-21T14:12:14Z

There is a configuration option for target GPU architechure: https://github.com/ddemidov/amgcl/blob/master/CMakeLists.txt#L161

It has some outdated arhcs there, try to set it to just the one you need.

nibblelab · 2024-11-21T16:15:26Z

Thanks for the reply.

changing the following line enables circumventing the architecture problem, but generates the other errors I've pointed out.

I've changed the CUDA_TARGET_ARCH for the architectures supported by the CPU as follows

set(CUDA_TARGET_ARCH "Pascal Volta Turing Ampere Ada" CACHE STRING "Target architecture(s) for CUDA")

and again, It enables to circumvent the architecture problem, but the compiling problem shows again, in another module though:

[ 68%] Linking CXX executable runtime_sdd_cuda
[ 68%] Built target runtime_sdd_cuda
[ 69%] Building NVCC (Device) object examples/mpi/CMakeFiles/runtime_sdd_3d_cuda.dir/runtime_sdd_3d_cuda_generated_runtime_sdd_3d.cu.o
. . .
avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
. . .
avx512fp16intrin.h(62): error: more than one conversion function from "__half" to "<error-type>" applies:
. . .
avx512fp16intrin.h(4289): error: return value type does not match the function type

Those errors was generated with Pascal architecture which is the older one that RTX 4070 supports.

ddemidov · 2024-11-21T16:19:02Z

~~Can you try and keep just the "Ada" on the arch list?~~

sorry, i've just noticed that you tried with "Auto". I guess I'll need to do some digging, as I don't have a device to test this.

nibblelab · 2024-11-21T16:50:45Z

Thank you.

I've also used a recompiled version of GCC as part of a CFD software environment. Hence my interest in AMGCL. I'm updating my environment and Nvidia CUDA toolkit as well to verify if the problem is related to GCC as this is a possibility (avx512fp16intrin.h is a header from GCC not NVCC).

For now, I've disabled the CUDA part in CMakeLists to test the CPU part of the library.

nibblelab · 2024-12-06T14:13:54Z

I've managed to solve this problem here. I was using the CUDA from Ubuntu repo and that was the source of the problem regarding the headers. Using the CUDA install from NVIDIA HPC repo solved the problem and I was finally able to compile and run amgcl on CUDA 12.x on RTX 4070.

It's necessary to make this change on CMakeLists.txt to ensure a compatible architecture will be used:

cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS Auto)

I've tested the Poisson problem in the tutorials (https://amgcl.readthedocs.io/en/latest/tutorial/poisson3Db.html) on CUDA/GPU and CPU and found the results interesting. But, the problem is rather small and the CUDA version ends up being slower thanks to the overhead generated by the CPU-GPU communication.

I've changed the Stokes tutorial (https://amgcl.readthedocs.io/en/latest/tutorial/Stokes.html) to use the GPU by using the Poisson CUDA code as an example, but, I'm having problems converting the Bin matrices used on Stokes to MatrixMarket used on Poisson. Is there any tool to make this conversion or a tutorial on how to use the Bin matrices/vectors to use with amgcl on CUDA?

ddemidov · 2024-12-06T15:30:23Z

That's great, thanks for letting me know!

There is ./examples/mm2bin and ./examples/bin2mm utilities:

./bin2mm --help
Options:
  -h [ --help ]         Show this help.
  -d [ --dense ]        Matrix is dense (use it with the RHS file).
  -i [ --input ] arg    Input binary file.
  -o [ --output ] arg   Ouput matrix in the MatrixMarket format.

Also, if you just want to test a 3D Poisson problem, you could run ./examples/solver -n 32 and ./examples/solver_cuda -n 32 to test the generated system for a 32x32x32 grid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

building examples with CUDA on RTX 4070 #280

building examples with CUDA on RTX 4070 #280

nibblelab commented Nov 21, 2024

ddemidov commented Nov 21, 2024

nibblelab commented Nov 21, 2024

ddemidov commented Nov 21, 2024 •

edited

Loading

nibblelab commented Nov 21, 2024

nibblelab commented Dec 6, 2024

ddemidov commented Dec 6, 2024 •

edited

Loading

building examples with CUDA on RTX 4070 #280

building examples with CUDA on RTX 4070 #280

Comments

nibblelab commented Nov 21, 2024

ddemidov commented Nov 21, 2024

nibblelab commented Nov 21, 2024

ddemidov commented Nov 21, 2024 • edited Loading

nibblelab commented Nov 21, 2024

nibblelab commented Dec 6, 2024

ddemidov commented Dec 6, 2024 • edited Loading

ddemidov commented Nov 21, 2024 •

edited

Loading

ddemidov commented Dec 6, 2024 •

edited

Loading