Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] rmm v23.10 #1354

Merged
merged 26 commits into from
Oct 11, 2023
Merged

[RELEASE] rmm v23.10 #1354

merged 26 commits into from
Oct 11, 2023

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-23.10 and v23.10 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-23.10 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-23.10 into main for the release

raydouglass and others added 25 commits July 20, 2023 15:33
This PR migrates RMM to use `fetch_rapids.cmake` like most RAPIDS repos. This makes it easier to define a single source if the upstream branch of rapids-cmake needs to change for testing, like in #1247.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #1319
This PR contains the minimal set of changes to compile using Cython 3 without warnings. Future PRs can be made to take advantage of new or improved features.

The specific changes are:
- Ensuring `nogil` always comes after `except`. `except * nogil` is a compile-time error in Cython 3
- Adding `noexcept` or `except *` to any `cdef ` functions missing them. In Cython 0.29 these would default to `noexcept`, which meant that exceptions would not be properly propagated. In Cython 3.0.0, these default to `except *`, which incurs a performance penalty for reacquiring the GIL to check the exception value even for `nogil` functions. Being explicit here is important.

There are a large number of outstanding warnings due to NVIDIA/cuda-python#44. cuda-python for CUDA 12 has the necessary fix, but we will need a cuda-python 11.8.* bugfix with a backport to make those warnings go away.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Lawrence Mitchell (https://github.com/wence-)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1313
This will allow us to more easily audit the quality of our docs in CI going forward.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1316
Forward-merge branch-23.08 to branch-23.10
The latest versions `4.1` and `4.0` of `sphinxcontrib-jquery` have an issue with jQuery installed currently in sphinx. This is causing the search functionality to fail so I've pinned the version to `3.0.0`, this resolves the issue.

Authors:
   - Jake Awe (https://github.com/AyodeAwe)

Approvers:
   - https://github.com/jakirkham
   - AJ Schmidt (https://github.com/ajschmidt8)
Add Python bindings for the `limiting_resource_adaptor` via a new Python memory resource `LimitingResourceAdaptor`.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1327
This PR replaces the `copy_prs` functionality from the `ops-bot` with the new dedicated `copy-pr-bot` GitHub application.

Docs for `copy-pr-bot` can be viewed below.

- https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/

**Important**: `copy-pr-bot` enforces signed commits. If an organization member opens a PR that contains unsigned commits, it will be deemed untrusted and therefore require an `/ok to test` comment. See the GitHub docs [here](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) for information on how to set up commit signing.

Any time a PR is deemed untrusted, it will receive a comment that looks like this: rapidsai/ci-imgs#63 (comment).

Authors:
   - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
   - Bradley Dice (https://github.com/bdice)
   - Jordan Jacobelli (https://github.com/jjacobelli)
This PR fixes all doxygen warnings from building the C++ documentation.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Rong Ou (https://github.com/rongou)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1317
…1335)

This PR:

1. Removes `ci/apply_wheel_modifications.sh` and uses it inline in wheel build scripts
2. Allows for specifying alpha versioned dependencies of RAPIDS projects

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1335
This change ensures that our doxygen documentation stays up to date during development. Since pre-commit also runs during CI, C++ documentation will be validated on every pull request.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mark Harris (https://github.com/harrism)

URL: #1334
With the release of conda 23.7.3, mamba's mambabuild stopped working. With boa installed, `conda mambabuild` uses the mamba solver, so just use that instead.

See also rapidsai/cudf#14068.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1338
I selected this version as it is what ships with Ubuntu 22.04. I also ran `doxygen -u` to update the Doxyfile.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Bradley Dice (https://github.com/bdice)

URL: #1337
…from another device (#1333)

As described in #1306 , stream_ordered_memory_resource maintained thread-local storage of an event to synchronize in the per-thread default stream. However, this caused the event to be recorded on the wrong stream if allocations were made on memory resources associated with different devices, because allocation on the first device on the PTDS would initialize the TLS for that stream, and subsequent device pools would try to use the already initialized TLS. 

This PR adds a new test that only runs on multidevice systems (more correctly, does nothing on single device systems). The test creates two per-device pools, and creates and destroys a device buffer on each. 

It also fixes `stream_ordered_memory_resource` to store the ID of the device that is current when it is constructed, and then to store a vector of one event per device in TLS rather than a single event. When the PTDS is passed to `get_event`, the event for the MR's stored device ID is used. This should correctly support PTDS with multiple threads and multiple devices (still one MR per device).

The PR also includes some changes to the device ID utilities in `cuda_device.hpp`. There is a new RAII device helper class, and a `get_num_cuda_devices()` function.

Finally, there is a small addition to the .clang-tidy to disable warnings about `do...while` loops inside of RMM error checking macros.

- Fixes #1306

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #1333
This PR adds section headers to the CMakeLists.txt to mirror other RAPIDS packages, which makes it easier to follow the logic and compare across packages. This PR replaces #1292.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Mark Harris (https://github.com/harrism)

URL: #1341
This PR updates rmm to use clang 16.0.6. The previous version 16.0.1 has some minor formatting issues affecting several RAPIDS repos.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #1343
This PR fixes a devcontainers issue seen by @miscco. The `doxygen` dependency can only be installed with conda, so we can't specify it in `requirements.txt`.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1344
PR updates `rapidsai/ci` references to `rapidsai/ci-conda`

Authors:
  - Jake Awe (https://github.com/AyodeAwe)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1346
Since Cython 3, we must explicitly request a C ABI with `-DCYTHON_EXTERN_C='extern "C"'` for `cdef public` functions.

- Closes #1349

Before:
```
$ nm -D .../rmm/python/rmm/_lib/torch_allocator.cpython-310-x86_64-linux-gnu.so | grep allocate
00000000000071c0 T _Z10deallocatePvlS_
0000000000007500 T _Z8allocateliPv
...
```

After:
```
$ nm -D .../rmm/python/rmm/_lib/torch_allocator.cpython-310-x86_64-linux-gnu.so | grep allocate
0000000000007500 T allocate
00000000000071c0 T deallocate
...
```

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Bradley Dice (https://github.com/bdice)

URL: #1350
This PR adds some [devcontainers](https://containers.dev/) to help simplify building the RMM C++ and Python libraries.

It also adds an optional job to the `pr.yaml` to [build the RMM libs in each devcontainer](https://github.com/trxcllnt/rmm/blob/fea/devcontainers/.github/workflows/pr.yaml#L79-L85), so the build caches are populated for devs by CI.

A devcontainer can be launched by clicking the "Reopen in Container" button that VSCode shows when opening the repo (or by using the "Rebuild and Reopen in Container" command from the command palette):
![image](https://user-images.githubusercontent.com/178183/221771999-97ab29d5-e718-4e5f-b32f-2cdd51bba25c.png)

Clicking this button will cause VSCode to prompt the user to select one of these devcontainer variants:
![image](https://github.com/rapidsai/rmm/assets/178183/68d4b264-4fc2-4008-92b6-cb4bdd19b29f)

On startup, the devcontainer creates or updates the conda/pip environment using `rmm/dependencies.yaml`. The envs/package caches are cached on the host via volume mounts, which are described in more detail in [`.devcontainer/README.md`](https://github.com/trxcllnt/rmm/blob/fea/devcontainers/.devcontainer/README.md).

The container includes convenience functions to clean, configure, and build the various RMM components:

```shell
$ clean-rmm-cpp # only cleans the C++ build dir
$ clean-rmm-python # only cleans the Python build dir
$ clean-rmm # cleans both C++ and Python build dirs

$ configure-rmm-cpp # only configures rmm C++ lib

$ build-rmm-cpp # only builds rmm C++ lib
$ build-rmm-python # only builds rmm Python lib
$ build-rmm # builds both C++ and Python libs
```

* The C++ build script is a small wrapper around `cmake -S ~/rmm -B ~/rmm/build` and `cmake --build ~/rmm/build`
* The Python build script is a small wrapper around `pip install --editable ~/rmm`

Unlike `build.sh`, these convenience scripts *don't* install the libraries after building them. Instead, they automatically inject the correct arguments to build the C++ libraries from source and use their build dirs as package roots:

```shell
$ cmake -S ~/rmm -B ~/rmm/build
$ CMAKE_ARGS="-Drmm_ROOT=~/rmm/build" \ # <-- this argument is automatic
  pip install -e ~/rmm
```

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1328
Enables RMM debug logging in Python and exposes controls for the level of logging. Fixes #1336.

Authors:
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1339
@raydouglass raydouglass requested review from a team as code owners September 28, 2023 14:56
@raydouglass raydouglass requested a review from harrism September 28, 2023 14:56
@github-actions github-actions bot added CMake Python Related to RMM Python API conda cpp Pertains to C++ code ci labels Sep 28, 2023
@raydouglass raydouglass merged commit f0bdb45 into main Oct 11, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake conda cpp Pertains to C++ code Python Related to RMM Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.