Introduce the NativeLink Kubernetes operator #1088

aaronmondal · 2024-07-06T21:07:00Z

A single kubectl apply -k now deploys NativeLink in a self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.

This change is

aaronmondal

+@adam-singer +@allada +@blakehatch

cc @MarcusSorealheis @bclark8923 @kubevalet

New docpages at:

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer, @allada, and @blakehatch)

MarcusSorealheis

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

The docs look good, though. Nice.

Reviewed 3 of 61 files at r1.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

why isn't this pointing to the TraceMachina repo?

deploy/dev/kustomization.yaml line 26 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again

deploy/kubernetes-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again

MarcusSorealheis

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

aaronmondal

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

After looking into reducing the size of this PR I think there are some parts that I can break out. I'll send PRs for those parts which hopefully also makes it a bit clearer why/how I'm making these changes.

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

We always had the pulumi dependency for the Chromium example. It's just more apparent now. However, this PR paves the way to reduce that dependency.

The way the examples generally work is:

Start a K8s cluster and prepare some dependencies in it. This is done via Pulumi.
Build or fetch NativeLink container images and toolchains. This was previously done via the 01_operations shell scripts. Now it happens automatically inside the cluster. This fixes Tag evaluation for K8s images shouldn't run on the host #1012 which is a blocker for MacOS.
Deploy the actual NativeLink deployments. This was previously done via the 02_application scripts and now also happens inside the cluster, so now it's no longer necessary to invoke any shell scripts manually.

What the new deploy and kubernetes directories do is essentially create "building blocks" for creating NativeLink K8s deployments. For instance, if you had an existing Helm chart (wink) you could now use these building blocks to deploy that chart as well. This also turns non-production parts of the examples, like the insecure example certs into Components. This way they're more easily swappable with e.g. "real" CAs. Same for the HttpRoutes which would require functional Gateway API Gateways in the cluster. Now it's possible to omit those routes and configure your own ingress logic instead.

The next step here is to also migrate the Tekton Pipelines out of the native-cli and into the kubernetes directory so that they're deployed via Flux instead of Pulumi. After that we've clearly separated concerns between Pulumi and K8s and the kubectl apply -k https://github.com/TraceMachina/nativelink//deploy/<somestack> should be self-contained enough that users can start running it against arbitrary K8s clusters.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

why isn't this pointing to the TraceMachina repo?

This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.

I just noticed that it's also possible to create a dedicated CI overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.

Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux branch override).

MarcusSorealheis

sgtm

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

aaronmondal

I broke out various parts of this PR in:

I'll rebase this PR after these have been merged.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

adam-singer

@aaronmondal happy to sync up offline and test out these changes locally to get a better sense what this means for using these locally

Reviewed 42 of 61 files at r1, 14 of 14 files at r2, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada and @blakehatch)

docs/src/content/docs/guides/chromium.mdx line 25 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: leaving this here so we don't for get it

docs/src/content/docs/guides/kubernetes.mdx line 26 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: before landing

MarcusSorealheis

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

A single `kubectl apply -k` now deploys NativeLink in a self-configuring, self-healing and self-updating fashion. To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations. First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster. Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization. This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines. This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups. Includes an overhaul of the Kubernetes documentation.

aaronmondal assigned adam-singer, allada and blakehatch Jul 6, 2024

aaronmondal commented Jul 6, 2024

View reviewed changes

aaronmondal force-pushed the flux branch 5 times, most recently from 11c1359 to cfad74d Compare July 7, 2024 04:39

MarcusSorealheis requested review from adam-singer, allada and blakehatch July 7, 2024 08:48

MarcusSorealheis requested changes Jul 7, 2024

View reviewed changes

MarcusSorealheis reviewed Jul 7, 2024

View reviewed changes

aaronmondal commented Jul 7, 2024

View reviewed changes

MarcusSorealheis reviewed Jul 7, 2024

View reviewed changes

aaronmondal commented Jul 8, 2024

View reviewed changes

aaronmondal mentioned this pull request Jul 8, 2024

Write Tekton image tag outputs to a ConfigMap #1100

Merged

aaronmondal force-pushed the flux branch from cfad74d to b83ae43 Compare July 8, 2024 21:06

adam-singer reviewed Jul 8, 2024

View reviewed changes

aaronmondal force-pushed the flux branch 2 times, most recently from 2c7b100 to 4bc267a Compare July 9, 2024 03:20

MarcusSorealheis requested a review from adam-singer July 9, 2024 03:25

MarcusSorealheis approved these changes Jul 9, 2024

View reviewed changes

aaronmondal force-pushed the flux branch 3 times, most recently from 55056b2 to f63cf71 Compare July 9, 2024 09:45

aaronmondal force-pushed the flux branch from f63cf71 to 75c943a Compare July 9, 2024 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the NativeLink Kubernetes operator #1088

Introduce the NativeLink Kubernetes operator #1088

aaronmondal commented Jul 6, 2024 •

edited by allada

Loading

aaronmondal left a comment

MarcusSorealheis left a comment

MarcusSorealheis left a comment

aaronmondal left a comment

MarcusSorealheis left a comment

aaronmondal left a comment

adam-singer left a comment

MarcusSorealheis left a comment

Introduce the NativeLink Kubernetes operator #1088

Are you sure you want to change the base?

Introduce the NativeLink Kubernetes operator #1088

Conversation

aaronmondal commented Jul 6, 2024 • edited by allada Loading

aaronmondal left a comment

Choose a reason for hiding this comment

MarcusSorealheis left a comment

Choose a reason for hiding this comment

MarcusSorealheis left a comment

Choose a reason for hiding this comment

aaronmondal left a comment

Choose a reason for hiding this comment

MarcusSorealheis left a comment

Choose a reason for hiding this comment

aaronmondal left a comment

Choose a reason for hiding this comment

adam-singer left a comment

Choose a reason for hiding this comment

MarcusSorealheis left a comment

Choose a reason for hiding this comment

aaronmondal commented Jul 6, 2024 •

edited by allada

Loading