Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

Open
raywainman opened this issue Nov 4, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@raywainman
Copy link
Contributor

raywainman commented Nov 4, 2024

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: vertical-pod-autoscaler-1.2.0+

Leader election functionality was added in #6985 and is turned off by default

What k8s version are you using (kubectl version)?:

Any version 1.27+

What environment is this in?:

GKE

What did you expect to happen?:

The self-deployed VPA recommender and the GKE implementation of HPA to continue working.

What happened instead?:

Both the self-deployed VPA recommender and the GKE version use a lease called vpa-recommender in kube-system.

If you deploy your own VPA recommender, it might "steal" the lease and prevent the GKE implementation of HPA.

How to reproduce it (as minimally and precisely as possible):

  • Create a cluster
  • Deploy your own vpa-recommender (eg. using vpa-up.sh). Make sure leader election is enabled (leader-elect=true).
  • Do something to disrupt the control plane (for example upgrade the version).
  • See if your self deployed version of the vpa-recommender has grabbed the lease, if so HPA could stop working.

Anything else?

This is due to the unfortunate naming collision between GKE's system controller (also called vpa-recommender and the one provided here)

@raywainman raywainman added the kind/bug Categorizes issue or PR as related to a bug. label Nov 4, 2024
@raywainman
Copy link
Contributor Author

cc @adrianmoisey

Some thoughts on mitigating this:

  • Document the flags very carefully to encourage folks not to use the default when deploying on GKE.
  • Change the default here which could possibly break some users.

@adrianmoisey
Copy link
Member

I see two paths forward to fixing this:

  1. GKE stops using vpa-recommender as a lease name
  2. VPA stops using vpa-recommender as a lease name

I can't speak for what that lease is being used for in GKE, but I can only assume that changing that lease is difficult or impossible in GKE.

Given that the lease(s) in VPA are only used for VPA components, and running multiple recommenders and updaters for a brief period isn't that worst thing in the world, my vote is that we change the default lease name in the VPA.

Any VPA configured with the lease enabled will only be running multiple pods for a short period of time, which should be fine.

It's obviously not an amazing path forward, but may be worth doing.

I'm curios what @voelzmo and @kwiesmueller think, as they may be the ones approving that controversial PR.

@adrianmoisey
Copy link
Member

adrianmoisey commented Nov 5, 2024

my vote is that we change the default lease name in the VPA.

If we do go this path, i suggest we also make PRs into 3rd party Helm charts to ensure they support the new default name. Some of them hardcode the lease name:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants