[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749

stefanprodan · 2024-04-27T09:34:06Z

Proposal for a Git-less bootstrap using OCI-compliant Container Registries as the desired state storage.

Spec preview: https://github.com/fluxcd/flux2/blob/rfc-flux-bootstrap-oci/rfcs/000X-flux-bootstrap-oci/README.md

Example:

flux bootstrap oci \
--url=oci://ghcr.io/stefanprodan/flux-manifests:production \
--username=stefanprodan \
--password=$GITHUB_TOKEN \
--kustomization=flux-manifests/kustomization.yaml \
--cluster-url=oci://ghcr.io/stefanprodan/fleet-manifests:production \
--cluster-path=clusters/production

Signed-off-by: Stefan Prodan <[email protected]>

errordeveloper

Thanks for drafting this, Stefan! Here are a few thoughts/suggestions from me :)

rfcs/000X-flux-bootstrap-oci/README.md

errordeveloper · 2024-04-27T12:03:33Z

rfcs/000X-flux-bootstrap-oci/README.md

+`password` and `kustomization` arguments:
+
+1. Logs in to the OCI registry using the provided credentials.
+2. Generates an OCI artifact from the Flux components manifests and the `kustomization.yaml` file.


Might be worse mentioning here regarding workload identity mods that may be needed, e.g. on EKS role ARN needs to be set as an annotation, I forgot if that was necessary in GKE also.

We have all of those documented here https://fluxcd.io/flux/installation/configuration/workload-identity/, people will need to read the docs and adapt their kustomization.yaml.

But perhaps there could be bootstrap argument to specify provider-specific attributes that would be handled accordingly based on provider flag?

There is --provider=<aws|azure|gcp> but this is for the CLI to use the role of the machine where it's running. The IAM of the bastion host may be different from the one that you want to use for Flux source-controller. So we'll need yet another flag with the identity name. This is how Flux AIO works: https://timoni.sh/flux-aio/#__tabbed_1_3

This is probably not the right place to bring this up but the workload identity for GKE is partially incorrect, you dont need to annotate service accounts any more with a GCP SA for GKE workload identity. You grant the Kubernetes service account access to what ever resources it needs via a member statement like below

principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/flux-system/sa/source-controller

I'm going to have a think on how I can update the docs on this one, but thought I'd raise it before I forget

We've recently adopted Flux for our multi-cloud architecture. In order to support that we're actually overriding env vars, volumes, and volume bindings directly in order to set up OIDC-based auth to each cloud on our workloads. If we can extend those parameters on the source controller, or if we can pass in a pod template that would allow us to easily inject OIDC auth for accessing the OCI backend (I suspect we could do this with the Helm install method, but support in the CLI would be nice, as the bootstrap command is great).

errordeveloper · 2024-04-27T12:20:17Z

rfcs/000X-flux-bootstrap-oci/README.md

+
+### Cluster state reconciliation configuration
+
+After the OCIRepository and Flux Kustomization called `flux` become ready, the command


Does this have to be strictly sequential and synchronous?

Yes it does, CRDs and controllers must be up and running before the cluster sync is deployed, same procedure as for the Git bootstrap.

rfcs/000X-flux-bootstrap-oci/README.md

errordeveloper · 2024-04-27T12:32:17Z

rfcs/000X-flux-bootstrap-oci/README.md

+On the server-side, the Flux controllers should be configured to self-update from the registry
+and reconcile the cluster state from OCI artifacts stored in the same or a different registry.


I would imagine that users may wish to choose between pulling from upstream OCI artifact that is published as part of the Flux release or having a full copy of it. If they choose to use a copy, another command may be needed to keep their copy up to date. Does it make sense?

You would always have a copy in your registry that includes customisations, same as with Git, bootstrap means vendoring the Flux manifests that in 99% of cases would need some fine tuning.

Ah, that's true that in most cases right now people will end up with a copy. How do they bring the copy up to date, e.g. some component has new pod spec fields that have to be set or there is an RBAC change? Do they have to read changelog and implement such changes? If in OCI world this could be avoided by means of referencing an upstream artifact and local changes stored as a patch/kustonization, it might be very nice actually, I guess it wasn't feasible with Git as an upstream.

For both Git and OCI bootstrap, the Flux update is fully automated in CI. See the Story 3 in this RFC. And also the docs here: https://fluxcd.io/flux/flux-gh-action/#automate-flux-updates

... bootstrap means vendoring the Flux manifests that in 99% of cases would need some fine tuning.

Just want to clarify, can this fine tuning be done with an in-cluster Kustomization, or has that been proven somehow challenging?

I've added the common flags to the RFC.

Controller selection is already supported via flags common to all bootstrap sub commands.

Yes, I know, what I mean is that it would be kind of less natural specify controllers with kustomize, one would probably need to select bases as there is no meaningful parameters and if or switch statements (at least last time I checked), the CLI offers are more meaningful option.
So because you need to select controllers, you start with CLI that gives you a single file that you kustomize a little, but it's all much complicated then you might have wished and defeats the purpose of putting kustomize in at this stage.

CUE on the other hand could do better in all of this and you could potentially remove the need for CLI and make custom configs easier to introduce. Does what I said earlier make more sense now?

The way Flux can be customized at bootstrap is all via Kustomize patches and CLI options, this must be 100% compatible with the oci sub-command. I’m not considering using CUE or anything really that would diverge from the current Git bootstrap procedure. Users should be able to migrate from Git to OCI by simply reusing their current flux-system overlay including patches, image overrides, configGenerator, volumes, etc.

That is for sure, I'm just thinking maybe self-managed flux in the future could use CUE for this, possibly even without exposing CUE to the user. CLI could still work the same way on the surface also. I do recall we once spoke of an installer operator too, you could use CUE there and in the CLI. Just an idea :)

errordeveloper · 2024-04-27T13:10:57Z

rfcs/000X-flux-bootstrap-oci/README.md

+```shell
+# pull the latest manifests from the registry
+flux pull artifact oci://ghcr.io/stefanprodan/flux-manifests:production \
+--output=./flux-manifests
+
+# update the Flux components manifests
+flux install --export > ./flux-manifests/flux-system/gotk-components.yaml


These are two alternative methods right? It's not very clear from the text at the moment.

If you don't have access to the cluster (what the user story is about), this is the only way. If you have API access, then like with Git bootstrap, you can just rerun it to update. OCI bootstrap behaves the same as Git bootstrap.

I just thought these two commands will write the same kind of output, except that install lets you select a subset of controllers... maybe I am missing something else.

Is it the case right now that one has to rerun bootstrap on major/minor releases while patch releases are taken care of by in-cluster image version bumps?

Every time we release Flux, users get a PR opened to update their manifests in Git. For OCI you would need some kind of semver range or some other manual gate e.g. a GitHub workflow dispatch to approve minor bumps and let only patch versions be automatically push to the registry.

I just thought these two commands will write the same kind of output, except that install lets you select a subset of controllers... maybe I am missing something else.

All the flux bootstrap commands have the same args as flux install, so you can pick controllers, etc with bootstrap too. If you bootstrap with flux bootstrap oci --components=source-controller,kustomize-controller, to update your would run flux install --components=source-controller,kustomize-controller --export in CI.

I am talking about'pull artifact' vs 'install --export' (per above)

You need to pull to preserve the existing kustomization.yaml and any other extra resources you may have added at bootstrap. The install -export command only generates the components YAML.

Ah, of course, this is a "rebase" ;)
Maybe the text needs to make that clearer.

Signed-off-by: Stefan Prodan <[email protected]>

nagyv · 2024-04-30T11:42:39Z

rfcs/000X-flux-bootstrap-oci/README.md

+1. Logs in to the OCI registry using the provided credentials.
+2. Generates an OCI artifact from the Flux components manifests and the `kustomization.yaml` file.
+3. Applies the Flux components manifests along with their customisations to the cluster.
+4. Pushes the OCI artifact to the container registry using the specified tag.


I wonder if it would make sense to have two registry auth. A read-only for the image pull secrets, and read-write for pushing the artifacts to the registry. Storing a read-write secret in the cluster for image pull secrets does not seem like a good idea.

It is even possible to consider more pull secrets with specific permissions (least privileges principle):

one with a read-only permission to be used by the Deployments of the Flux controllers to pull images

one with a read-only permission (different namespace than images) to be used by the OCIRepository resource to pull OCI artifacts

one with write permissions to be used by the bootstrap command line

Then, it is possible to consider different OCI registry: one for images and another for Flux artifacts, because the latter could contain sensible infrastructure information.

I wonder if it would make sense to have two registry auth. A read-only for the image pull secrets, and read-write for pushing the artifacts to the registry.

Yes this is something the command could support. Currently our OCI implementation supports reading the Docker config file from the host OS, so we could use that for write operations and the flags for in-cluster secret.

@sestegra the pull secret for the container images is already supported, it's one of the common flags to all bootstrap commands.

mo-saeed · 2024-07-05T10:52:40Z

@stefanprodan is there any estimation when this can be released ?

[RFC] Flux Bootstrap for OCI-compliant Container Registries

ab4692c

Signed-off-by: Stefan Prodan <[email protected]>

stefanprodan added the area/rfc Feature request proposals in the RFC format label Apr 27, 2024

stefanprodan mentioned this pull request Apr 27, 2024

Create flux_bootstrap_oci resource fluxcd/terraform-provider-flux#501

Open

errordeveloper reviewed Apr 27, 2024

View reviewed changes

stefanprodan force-pushed the rfc-flux-bootstrap-oci branch from a5bc720 to 06b03c4 Compare April 28, 2024 07:31

Add workload identity user story

d611398

Signed-off-by: Stefan Prodan <[email protected]>

stefanprodan force-pushed the rfc-flux-bootstrap-oci branch from 06b03c4 to d611398 Compare April 28, 2024 07:38

stefanprodan added 3 commits April 28, 2024 10:49

Add artifact contents to spec

ae9f312

Signed-off-by: Stefan Prodan <[email protected]>

Add trace back to Git story

59d1c7e

Signed-off-by: Stefan Prodan <[email protected]>

Add common flags

9c88182

Signed-off-by: Stefan Prodan <[email protected]>

nagyv reviewed Apr 30, 2024

View reviewed changes

stefanprodan mentioned this pull request Jun 10, 2024

Implement the cluster sync feature controlplaneio-fluxcd/flux-operator#20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749

[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749

stefanprodan commented Apr 27, 2024 •

edited

Loading

errordeveloper left a comment

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024

userbradley Apr 30, 2024

thejosephstevens Aug 29, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024 •

edited

Loading

errordeveloper Apr 27, 2024

stefanprodan Apr 28, 2024 •

edited

Loading

errordeveloper Apr 28, 2024

errordeveloper Apr 28, 2024

stefanprodan Apr 28, 2024

errordeveloper Apr 29, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024

errordeveloper Apr 27, 2024

errordeveloper Apr 27, 2024

stefanprodan Apr 27, 2024 •

edited

Loading

stefanprodan Apr 27, 2024 •

edited

Loading

errordeveloper Apr 27, 2024 •

edited

Loading

stefanprodan Apr 27, 2024

errordeveloper Apr 29, 2024 •

edited

Loading

nagyv Apr 30, 2024

sestegra May 2, 2024

stefanprodan May 2, 2024

mo-saeed commented Jul 5, 2024


		### Cluster state reconciliation configuration

		After the OCIRepository and Flux Kustomization called `flux` become ready, the command

		On the server-side, the Flux controllers should be configured to self-update from the registry
		and reconcile the cluster state from OCI artifacts stored in the same or a different registry.

[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749

Are you sure you want to change the base?

[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749

Conversation

stefanprodan commented Apr 27, 2024 • edited Loading

errordeveloper left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanprodan Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanprodan Apr 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanprodan Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

stefanprodan Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

errordeveloper Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

errordeveloper Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mo-saeed commented Jul 5, 2024

stefanprodan commented Apr 27, 2024 •

edited

Loading

stefanprodan Apr 27, 2024 •

edited

Loading

stefanprodan Apr 28, 2024 •

edited

Loading

stefanprodan Apr 27, 2024 •

edited

Loading

stefanprodan Apr 27, 2024 •

edited

Loading

errordeveloper Apr 27, 2024 •

edited

Loading

errordeveloper Apr 29, 2024 •

edited

Loading