-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Move some macOS CI jobs to Github actions #89083
Conversation
@llvm/pr-subscribers-github-workflow Author: Louis Dionne (ldionne) ChangesThis is an attempt to decouple macOS CI testing from BuildKite, which would make the maintenance of macOS CI easier and more accessible to all contributors. Right now, the macOS CI is running entirely on machines owned by the LLVM Foundation with only a small set of contributors having direct access to them. The story for performing back-deployment testing still needs to be figured out, so for now we are retaining some jobs under BuildKite. Full diff: https://github.com/llvm/llvm-project/pull/89083.diff 2 Files Affected:
diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml
index 1e9367732e5911..99011c9bd0cb84 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -85,6 +85,7 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
stage2:
if: github.repository_owner == 'llvm'
runs-on: libcxx-runners-8-set
@@ -134,6 +135,7 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
stage3:
if: github.repository_owner == 'llvm'
needs: [ stage1, stage2 ]
@@ -199,6 +201,38 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
+ macos:
+ runs-on: macos-latest
+ needs: [ stage1 ]
+ strategy:
+ fail-fast: false
+ matrix:
+ config: [
+ generic-cxx03,
+ generic-cxx23,
+ generic-modules,
+ apple-system
+ ]
+ steps:
+ - uses: actions/checkout@v4
+ - uses: maxim-lobanov/setup-xcode@v1
+ with:
+ xcode-version: 'latest-stable'
+ - name: Build and test
+ run: |
+ bash libcxx/utils/ci/run-buildbot ${{ matrix.config }}
+ - uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8 # v4.3.0
+ if: always() # Upload artifacts even if the build or test suite fails
+ with:
+ name: macos-${{ matrix.config }}-results
+ path: |
+ **/test-results.xml
+ **/*.abilist
+ **/CMakeError.log
+ **/CMakeOutput.log
+ **/crash_diagnostics/*
+
windows:
runs-on: windows-2022
needs: [ stage1 ]
diff --git a/libcxx/utils/ci/buildkite-pipeline.yml b/libcxx/utils/ci/buildkite-pipeline.yml
index 4bacdec8f8d6bc..0e9a02ad081b13 100644
--- a/libcxx/utils/ci/buildkite-pipeline.yml
+++ b/libcxx/utils/ci/buildkite-pipeline.yml
@@ -56,47 +56,8 @@ environment_definitions:
steps:
-- group: ':mac: Apple'
+- group: ':mac: Apple Backdeployment'
steps:
- - label: MacOS x86_64
- command: libcxx/utils/ci/run-buildbot generic-cxx23
- agents:
- queue: libcxx-builders
- os: macos
- arch: x86_64
- <<: *common
-
- - label: MacOS arm64
- command: libcxx/utils/ci/run-buildbot generic-cxx23
- agents:
- queue: libcxx-builders
- os: macos
- arch: arm64
- <<: *common
-
- - label: MacOS with Modules
- command: libcxx/utils/ci/run-buildbot generic-modules
- agents:
- queue: libcxx-builders
- os: macos
- <<: *common
-
- - label: MacOS with C++03
- command: libcxx/utils/ci/run-buildbot generic-cxx03
- agents:
- queue: libcxx-builders
- os: macos
- <<: *common
-
- # Build with the configuration we use to generate libc++.dylib on Apple platforms
- - label: Apple system
- command: libcxx/utils/ci/run-buildbot apple-system
- agents:
- queue: libcxx-builders
- os: macos
- arch: arm64 # This can technically run on any architecture, but we have more resources on arm64 so we pin this job to arm64
- <<: *common
-
- label: Apple back-deployment macosx10.13
command: libcxx/utils/ci/run-buildbot apple-system-backdeployment-10.13
agents:
@@ -121,6 +82,23 @@ steps:
arch: x86_64 # TODO: Remove this once we are able to run back-deployment on arm64 again, since this isn't x86_64 specific
<<: *common
+ # TODO: Re-enable this once we've figured out how to run back-deployment testing on arm64 on recent OSes
+ # - label: "Apple back-deployment macosx11.0 arm64"
+ # command: "libcxx/utils/ci/run-buildbot apple-system-backdeployment-11.0"
+ # artifact_paths:
+ # - "**/test-results.xml"
+ # - "**/*.abilist"
+ # agents:
+ # queue: "libcxx-builders"
+ # os: "macos"
+ # arch: "arm64"
+ # retry:
+ # automatic:
+ # - exit_status: -1 # Agent was lost
+ # limit: 2
+ # timeout_in_minutes: 120
+
+
- group: ARM
steps:
- label: AArch64
@@ -230,20 +208,3 @@ steps:
queue: libcxx-builders
os: android
<<: *common
-
-
- # TODO: Re-enable this once we've figured out how to run back-deployment testing on arm64 on recent OSes
- # - label: "Apple back-deployment macosx11.0 arm64"
- # command: "libcxx/utils/ci/run-buildbot apple-system-backdeployment-11.0"
- # artifact_paths:
- # - "**/test-results.xml"
- # - "**/*.abilist"
- # agents:
- # queue: "libcxx-builders"
- # os: "macos"
- # arch: "arm64"
- # retry:
- # automatic:
- # - exit_status: -1 # Agent was lost
- # limit: 2
- # timeout_in_minutes: 120
|
@llvm/pr-subscribers-libcxx Author: Louis Dionne (ldionne) ChangesThis is an attempt to decouple macOS CI testing from BuildKite, which would make the maintenance of macOS CI easier and more accessible to all contributors. Right now, the macOS CI is running entirely on machines owned by the LLVM Foundation with only a small set of contributors having direct access to them. The story for performing back-deployment testing still needs to be figured out, so for now we are retaining some jobs under BuildKite. Full diff: https://github.com/llvm/llvm-project/pull/89083.diff 2 Files Affected:
diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml
index 1e9367732e5911..99011c9bd0cb84 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -85,6 +85,7 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
stage2:
if: github.repository_owner == 'llvm'
runs-on: libcxx-runners-8-set
@@ -134,6 +135,7 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
stage3:
if: github.repository_owner == 'llvm'
needs: [ stage1, stage2 ]
@@ -199,6 +201,38 @@ jobs:
**/CMakeError.log
**/CMakeOutput.log
**/crash_diagnostics/*
+
+ macos:
+ runs-on: macos-latest
+ needs: [ stage1 ]
+ strategy:
+ fail-fast: false
+ matrix:
+ config: [
+ generic-cxx03,
+ generic-cxx23,
+ generic-modules,
+ apple-system
+ ]
+ steps:
+ - uses: actions/checkout@v4
+ - uses: maxim-lobanov/setup-xcode@v1
+ with:
+ xcode-version: 'latest-stable'
+ - name: Build and test
+ run: |
+ bash libcxx/utils/ci/run-buildbot ${{ matrix.config }}
+ - uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8 # v4.3.0
+ if: always() # Upload artifacts even if the build or test suite fails
+ with:
+ name: macos-${{ matrix.config }}-results
+ path: |
+ **/test-results.xml
+ **/*.abilist
+ **/CMakeError.log
+ **/CMakeOutput.log
+ **/crash_diagnostics/*
+
windows:
runs-on: windows-2022
needs: [ stage1 ]
diff --git a/libcxx/utils/ci/buildkite-pipeline.yml b/libcxx/utils/ci/buildkite-pipeline.yml
index 4bacdec8f8d6bc..0e9a02ad081b13 100644
--- a/libcxx/utils/ci/buildkite-pipeline.yml
+++ b/libcxx/utils/ci/buildkite-pipeline.yml
@@ -56,47 +56,8 @@ environment_definitions:
steps:
-- group: ':mac: Apple'
+- group: ':mac: Apple Backdeployment'
steps:
- - label: MacOS x86_64
- command: libcxx/utils/ci/run-buildbot generic-cxx23
- agents:
- queue: libcxx-builders
- os: macos
- arch: x86_64
- <<: *common
-
- - label: MacOS arm64
- command: libcxx/utils/ci/run-buildbot generic-cxx23
- agents:
- queue: libcxx-builders
- os: macos
- arch: arm64
- <<: *common
-
- - label: MacOS with Modules
- command: libcxx/utils/ci/run-buildbot generic-modules
- agents:
- queue: libcxx-builders
- os: macos
- <<: *common
-
- - label: MacOS with C++03
- command: libcxx/utils/ci/run-buildbot generic-cxx03
- agents:
- queue: libcxx-builders
- os: macos
- <<: *common
-
- # Build with the configuration we use to generate libc++.dylib on Apple platforms
- - label: Apple system
- command: libcxx/utils/ci/run-buildbot apple-system
- agents:
- queue: libcxx-builders
- os: macos
- arch: arm64 # This can technically run on any architecture, but we have more resources on arm64 so we pin this job to arm64
- <<: *common
-
- label: Apple back-deployment macosx10.13
command: libcxx/utils/ci/run-buildbot apple-system-backdeployment-10.13
agents:
@@ -121,6 +82,23 @@ steps:
arch: x86_64 # TODO: Remove this once we are able to run back-deployment on arm64 again, since this isn't x86_64 specific
<<: *common
+ # TODO: Re-enable this once we've figured out how to run back-deployment testing on arm64 on recent OSes
+ # - label: "Apple back-deployment macosx11.0 arm64"
+ # command: "libcxx/utils/ci/run-buildbot apple-system-backdeployment-11.0"
+ # artifact_paths:
+ # - "**/test-results.xml"
+ # - "**/*.abilist"
+ # agents:
+ # queue: "libcxx-builders"
+ # os: "macos"
+ # arch: "arm64"
+ # retry:
+ # automatic:
+ # - exit_status: -1 # Agent was lost
+ # limit: 2
+ # timeout_in_minutes: 120
+
+
- group: ARM
steps:
- label: AArch64
@@ -230,20 +208,3 @@ steps:
queue: libcxx-builders
os: android
<<: *common
-
-
- # TODO: Re-enable this once we've figured out how to run back-deployment testing on arm64 on recent OSes
- # - label: "Apple back-deployment macosx11.0 arm64"
- # command: "libcxx/utils/ci/run-buildbot apple-system-backdeployment-11.0"
- # artifact_paths:
- # - "**/test-results.xml"
- # - "**/*.abilist"
- # agents:
- # queue: "libcxx-builders"
- # os: "macos"
- # arch: "arm64"
- # retry:
- # automatic:
- # - exit_status: -1 # Agent was lost
- # limit: 2
- # timeout_in_minutes: 120
|
Let's wait for the results to come back, but this LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often will this job be running? We are limited to only 5 concurrent macOS jobs at a time, so if this is going to be running for each PR, we may quickly run out of capacity.
@@ -199,6 +201,38 @@ jobs: | |||
**/CMakeError.log | |||
**/CMakeOutput.log | |||
**/crash_diagnostics/* | |||
|
|||
macos: | |||
runs-on: macos-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using macos-14 here, so we can control when we update to the next macos version.
If everything is already running on machines owned by the foundation, it should just be a matter of setting up the Github actions agent on them to get more runner availability? Like @tstellar mentioned, that's not what this PR does currently though. |
This would run after stage1 has completed. Only a subset of all PRs make it there, as most will fail in stage1.
Yes, we could do that or we could pay for additional capacity on Github directly. I have been maintaining the Foundation macOS nodes and not having to do that anymore is the main motivation for this change. |
e9563be
to
ca9e500
Compare
Would we be able to repurpose those machines into self-hosted GitHub runners? If so, I can help with the maintenance of those machines. That would help ensure we have the capacity we need. |
Yes, that should be possible. I will start by making it work for the Github-provided builders and then we can tackle the capacity increase as a separate task. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
These tests have always been flaky, which led us to using ALLOW_RETRIES on them. However, while investigating llvm#89083 (using Github provided macOS builders), these tests surfaced as being basically unworkably flaky in that environment. This patch solves that problem by refactoring the tests to make them succeed deterministically.
@ldionne This PR seems to have a lot of unrelated libc++ changes that somehow got pulled in? Also, we'll have to think about the capacity for this. I believe we can only have five concurrent macOS jobs running at the same time. I'm not sure how many concurrent jobs the libc++ CI normally runs or how long the macOS jobs run for, but this could quickly end up becoming an issue. I don't think it would block too much else in the project though as everything (that I can think of) runs on Linux runners, other than the release jobs. |
These tests have always been flaky, which led us to using ALLOW_RETRIES on them. However, while investigating llvm#89083 (using Github provided macOS builders), these tests surfaced as being basically unworkably flaky in that environment. This patch solves that problem by refactoring the tests to make them succeed deterministically.
These tests have always been flaky, which led us to using ALLOW_RETRIES on them. However, while investigating #89083 (using Github provided macOS builders), these tests surfaced as being basically unworkably flaky in that environment. This patch solves that problem by refactoring the tests to make them succeed deterministically.
9d1f603
to
a4a0306
Compare
I am just using this to iterate on fixing flaky tests that only show up on the Github-provided macOS builders. These changes won't be part of the final patch.
I agree. We'd probably want to set up the existing Foundation-provided macOS machines as runners available via Github actions, but I wanted to look into that separately. |
6af03cb
to
2b7b4ee
Compare
@philnik777 Across all jobs, it looks like the following tests are failing (all under
If you want to start looking at the tests under Edit: #100783 should address some #102151 addresses more. |
2b7b4ee
to
0e51307
Compare
0e51307
to
d38eb96
Compare
This patch decouples macOS CI testing from BuildKite, which makes the maintenance of macOS CI easier and more accessible to all contributors. Right now, the macOS CI is running entirely on machines owned by the LLVM Foundation with only a small set of contributors having direct access to them. In particular, updating these machines is currently a very time-consuming manual process that requires taking the machines offline, and using Github-provided instances makes that an order of magnitude easier. The story for performing back-deployment testing still needs to be figured out, so for now we are retaining some jobs under BuildKite.
d38eb96
to
52b8a94
Compare
I looked at the policy for Github Actions runner minutes here: https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#about-billing-for-github-actions. For public repositories, there is no limit of minutes per month. There is a maximum of 5 macOS runners at a time which could be a problem for capacity if we want to start spinning a lot more macOS CI jobs. However, I would cross that bridge when we get to it. In particular, the goal of this transition is to move away from the MacMiniVault macOS runners we're currently using for BuildKite because they are very tedious to keep up-to-date. I'd like to avoid setting up and maintaining Github-hosted runners on those machines since that would defeat the main benefit of moving to Github actions. @tstellar Are you fine with me landing this patch and then figuring out how to best address capacity issues if we have them? In the short term, one benefit of landing this patch is that we could eventually let go some of the machines we have in MacMiniVault (I think we can let go the two arm64 machines). |
…d counterparts (llvm#104852) This refactoring is done to remove flakyness as described in llvm#89083.
Let's try it out and see how it goes. |
FYI the performance of |
This patch decouples macOS CI testing from BuildKite, which makes the maintenance of macOS CI easier and more accessible to all contributors. Right now, the macOS CI is running entirely on machines owned by the LLVM Foundation with only a small set of contributors having direct access to them. In particular, updating these machines is currently a very time-consuming manual process that requires taking the machines offline, and using Github-provided instances makes that an order of magnitude easier. The story for performing back-deployment testing still needs to be figured out, so for now we are retaining some jobs under BuildKite.
…d counterparts (llvm#104852) This refactoring is done to remove flakyness as described in llvm#89083.
This patch decouples macOS CI testing from BuildKite, which makes the
maintenance of macOS CI easier and more accessible to all contributors.
Right now, the macOS CI is running entirely on machines owned by the
LLVM Foundation with only a small set of contributors having direct
access to them. In particular, updating these machines is currently
a very time-consuming manual process that requires taking the machines
offline, and using Github-provided instances makes that an order of
magnitude easier.
The story for performing back-deployment testing still needs to be
figured out, so for now we are retaining some jobs under BuildKite.