-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the NativeLink Kubernetes operator #1088
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@adam-singer +@allada +@blakehatch
cc @MarcusSorealheis @bclark8923 @kubevalet
New docpages at:
- https://df0124ed.nativelink.pages.dev/guides/kubernetes/
- https://df0124ed.nativelink.pages.dev/guides/chromium/
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer, @allada, and @blakehatch)
11c1359
to
cfad74d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is massive and there is no simple opportunity for breaking it up. just annoying.
The docs look good, though. Nice.
Reviewed 3 of 61 files at r1.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
why isn't this pointing to the TraceMachina repo?
deploy/dev/kustomization.yaml
line 26 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
again
deploy/kubernetes-example/kustomization.yaml
line 18 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i don't like is that the Chromium example received a Pulumi dependency. Is that required?
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is massive and there is no simple opportunity for breaking it up. just annoying.
After looking into reducing the size of this PR I think there are some parts that I can break out. I'll send PRs for those parts which hopefully also makes it a bit clearer why/how I'm making these changes.
What i don't like is that the Chromium example received a Pulumi dependency. Is that required?
We always had the pulumi dependency for the Chromium example. It's just more apparent now. However, this PR paves the way to reduce that dependency.
The way the examples generally work is:
- Start a K8s cluster and prepare some dependencies in it. This is done via Pulumi.
- Build or fetch NativeLink container images and toolchains. This was previously done via the
01_operations
shell scripts. Now it happens automatically inside the cluster. This fixes Tag evaluation for K8s images shouldn't run on the host #1012 which is a blocker for MacOS. - Deploy the actual NativeLink deployments. This was previously done via the
02_application
scripts and now also happens inside the cluster, so now it's no longer necessary to invoke any shell scripts manually.
What the new deploy
and kubernetes
directories do is essentially create "building blocks" for creating NativeLink K8s deployments. For instance, if you had an existing Helm chart (wink) you could now use these building blocks to deploy that chart as well. This also turns non-production parts of the examples, like the insecure example certs into Components. This way they're more easily swappable with e.g. "real" CAs. Same for the HttpRoutes which would require functional Gateway API Gateways in the cluster. Now it's possible to omit those routes and configure your own ingress logic instead.
The next step here is to also migrate the Tekton Pipelines out of the native-cli
and into the kubernetes
directory so that they're deployed via Flux instead of Pulumi. After that we've clearly separated concerns between Pulumi and K8s and the kubectl apply -k https://github.com/TraceMachina/nativelink//deploy/<somestack>
should be self-contained enough that users can start running it against arbitrary K8s clusters.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
why isn't this pointing to the TraceMachina repo?
This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.
I just noticed that it's also possible to create a dedicated CI
overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.
Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux
branch override).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke out various parts of this PR in:
- Allow Tekton pipelines to be triggered by Flux Alerts #1094
- Update Go dependencies #1095
- Add Flux to development cluster #1096
- Allow WebSocket upgrades in devcluster Loadbalancer #1098
- Write Tekton image tag outputs to a ConfigMap #1100
I'll rebase this PR after these have been merged.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronmondal happy to sync up offline and test out these changes locally to get a better sense what this means for using these locally
Reviewed 42 of 61 files at r1, 14 of 14 files at r2, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada and @blakehatch)
docs/src/content/docs/guides/chromium.mdx
line 25 at r2 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging.
nit: leaving this here so we don't for get it
docs/src/content/docs/guides/kubernetes.mdx
line 26 at r2 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging.
nit: before landing
2c7b100
to
4bc267a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
55056b2
to
f63cf71
Compare
A single `kubectl apply -k` now deploys NativeLink in a self-configuring, self-healing and self-updating fashion. To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations. First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster. Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization. This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines. This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups. Includes an overhaul of the Kubernetes documentation.
A single
kubectl apply -k
now deploys NativeLink in a self-configuring, self-healing and self-updating fashion.To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations.
First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster.
Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization.
This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines.
This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups.
Includes an overhaul of the Kubernetes documentation.
This change is