Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the NativeLink Kubernetes operator #1088

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aaronmondal
Copy link
Member

@aaronmondal aaronmondal commented Jul 6, 2024

A single kubectl apply -k now deploys NativeLink in a self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.


This change is Reviewable

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@adam-singer +@allada +@blakehatch

cc @MarcusSorealheis @bclark8923 @kubevalet

New docpages at:

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

The docs look good, though. Nice.

Reviewed 3 of 61 files at r1.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

why isn't this pointing to the TraceMachina repo?


deploy/dev/kustomization.yaml line 26 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again


deploy/kubernetes-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

After looking into reducing the size of this PR I think there are some parts that I can break out. I'll send PRs for those parts which hopefully also makes it a bit clearer why/how I'm making these changes.

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

We always had the pulumi dependency for the Chromium example. It's just more apparent now. However, this PR paves the way to reduce that dependency.

The way the examples generally work is:

  1. Start a K8s cluster and prepare some dependencies in it. This is done via Pulumi.
  2. Build or fetch NativeLink container images and toolchains. This was previously done via the 01_operations shell scripts. Now it happens automatically inside the cluster. This fixes Tag evaluation for K8s images shouldn't run on the host #1012 which is a blocker for MacOS.
  3. Deploy the actual NativeLink deployments. This was previously done via the 02_application scripts and now also happens inside the cluster, so now it's no longer necessary to invoke any shell scripts manually.

What the new deploy and kubernetes directories do is essentially create "building blocks" for creating NativeLink K8s deployments. For instance, if you had an existing Helm chart (wink) you could now use these building blocks to deploy that chart as well. This also turns non-production parts of the examples, like the insecure example certs into Components. This way they're more easily swappable with e.g. "real" CAs. Same for the HttpRoutes which would require functional Gateway API Gateways in the cluster. Now it's possible to omit those routes and configure your own ingress logic instead.

The next step here is to also migrate the Tekton Pipelines out of the native-cli and into the kubernetes directory so that they're deployed via Flux instead of Pulumi. After that we've clearly separated concerns between Pulumi and K8s and the kubectl apply -k https://github.com/TraceMachina/nativelink//deploy/<somestack> should be self-contained enough that users can start running it against arbitrary K8s clusters.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

why isn't this pointing to the TraceMachina repo?

This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.

I just noticed that it's also possible to create a dedicated CI overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.

Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux branch override).

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke out various parts of this PR in:

I'll rebase this PR after these have been merged.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronmondal happy to sync up offline and test out these changes locally to get a better sense what this means for using these locally

Reviewed 42 of 61 files at r1, 14 of 14 files at r2, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada and @blakehatch)


docs/src/content/docs/guides/chromium.mdx line 25 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: leaving this here so we don't for get it


docs/src/content/docs/guides/kubernetes.mdx line 26 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: before landing

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

A single `kubectl apply -k` now deploys NativeLink in a
self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously
reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on
GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization
to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts,
but now happens fully automated in the cluster and no longer requires a
local Nix installation as all tag evaluations have become implementation
details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice"
Kustomize directory layout. This further reduces code duplication and
gives third parties greater flexibility and more useful reference points
to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants