Skip to content

Commit

Permalink
Add missing CUDA 12 dependencies and fix dlopen library names (#1366)
Browse files Browse the repository at this point in the history
The dropping of system CTK libraries from our CUDA 12 CI images revealed that we were missing the cuda-nvcc package required to provide nvvm for numba in the Python tests. They also revealed that the list of libraries we searched to dlopen is incomplete; for CUDA 11, the SONAME of the library incorrectly includes an extra `.0` version segment, and rmm was designed to search for that, but CUDA 12 correctly has just `libcudart.so.12` and that needs to be added to the search path. We were previously getting by on finding `libcudart.so`, but the linker name is only present in conda environments if `cuda-cudart-dev` is installed, and that package should not be a runtime requirement for rmm.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Rong Ou (https://github.com/rongou)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1366
  • Loading branch information
vyasr authored Oct 24, 2023
1 parent 596ccf9 commit 39800d3
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 7 deletions.
2 changes: 2 additions & 0 deletions ci/test_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ rapids-dependency-file-generator \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml

rapids-mamba-retry env create --force -f env.yaml -n test
set +u
conda activate test
set -u

rapids-print-env

Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- clang-tools==16.0.6
- clang==16.0.6
- cmake>=3.26.4
- cuda-nvcc
- cuda-python>=11.7.1,<12.0a0
- cuda-version=11.8
- cudatoolkit
Expand Down
4 changes: 4 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -223,3 +223,7 @@ dependencies:
packages:
- pytest
- pytest-cov
- output_types: conda
packages:
# Needed for numba in tests
- cuda-nvcc
23 changes: 16 additions & 7 deletions include/rmm/detail/dynamic_load_runtime.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,22 @@ struct dynamic_load_runtime {
auto close_cudart = [](void* handle) { ::dlclose(handle); };
auto open_cudart = []() {
::dlerror();
const int major = CUDART_VERSION / 1000;
const std::string libname_ver = "libcudart.so." + std::to_string(major) + ".0";
const std::string libname = "libcudart.so";

auto ptr = ::dlopen(libname_ver.c_str(), RTLD_LAZY);
if (!ptr) { ptr = ::dlopen(libname.c_str(), RTLD_LAZY); }
if (ptr) { return ptr; }
const int major = CUDART_VERSION / 1000;

// In CUDA 12 the SONAME is correctly defined as libcudart.12, but for
// CUDA<=11 it includes an extra 0 minor version e.g. libcudart.11.0. We
// also allow finding the linker name.
const std::string libname_ver_cuda_11 = "libcudart.so." + std::to_string(major) + ".0";
const std::string libname_ver_cuda_12 = "libcudart.so." + std::to_string(major);
const std::string libname = "libcudart.so";

void* ptr = nullptr;
for (auto&& name : {libname_ver_cuda_12, libname_ver_cuda_11, libname}) {
ptr = dlopen(name.c_str(), RTLD_LAZY);
if (ptr != nullptr) break;
}

if (ptr != nullptr) { return ptr; }

RMM_FAIL("Unable to dlopen cudart");
};
Expand Down

0 comments on commit 39800d3

Please sign in to comment.