Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support out of the box self-scaling for the VPA with updateMode=Recreate #7450

Open
maxcao13 opened this issue Oct 31, 2024 · 9 comments
Open
Assignees
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@maxcao13
Copy link

Which component are you using?:

Vertical Pod Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

When using the VPA on itself (i.e. creating a VPA object with updateMode=Recreate and minReplicas=1 that targetRef targets the admission-controller deployment), there exists a race between the webhook server admitting its replacement pod, and the the webhook server pod getting terminated. This will result in sometimes the recommended request/limit not being correctly mutated in the new pod definition, and sometimes it does.

Describe the solution you'd like.:

Add a --self-scaling=true/false flag to the admission webhook, that enables a shutdown hook which checks for the existence of a VPA CR targetting itself. If it exists, it waits 30 seconds (terminationGracePeriod) until the replacement pod gets admitted into the webhook, and then the pod can terminate gracefully.

Describe any alternative solutions you've considered.:
Another solution was to just scale the admission-controller deployment to 2 pods if you enabled self-scaling, but that results in more resources wasted than need be.

Another workaround is to just turn updateMode=Off and manually apply requests/limits, but this eliminates the autoscale nature.

And of course, this entire problem would be solved by inplaceverticalscaling since we don't have to recreate the pod anymore, but I think this is a good enough solution until that GAs, because that feature seems far off from now.

Additional context.:

I'd love to know if there are already solutions to this, but I have not found anything online nor any discussion about it.

Also, I already have code for this and tested that this solution works, so if people think this is a good idea or want to see what it looks like, I can submit a PR for it.

@maxcao13 maxcao13 added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 31, 2024
@maxcao13
Copy link
Author

/area vertical-pod-autoscaler

@adrianmoisey
Copy link
Member

This seems like a pretty niche problem. My vote is to run multiple replicas. Adding a feature to allow the VPA to manage itself seems like it's a bit too complex.

@adrianmoisey
Copy link
Member

Would a preStop hook help here?
May be even the upcoming feature: kubernetes/enhancements#3960

@maxcao13
Copy link
Author

Thanks for the response!

For what it's worth, I've seen people actually use self-scaling in their production clusters, and I notice the constant eviction events every minute until the race condition resolves. So it does happen in practice.

I think it could technically work already if we register a preStop hook with the sleep feature (either using that feature gate, or manually execing a sleep), but the tradeoff is that the pod will stay in Terminating state for a constant number of seconds. The solution I have currently has a much faster termination before the a potential sleep ends, if the replacement pod has already been created.

I can create a PR with what I have so people can see how complex it really is. If we still think it's too much for this problem, would it be enough for me to close this with a PR that enables a self-scaling flag that just programmatically registers a shutdown hook in the admission go code that just sleeps for 30 seconds?

@adrianmoisey
Copy link
Member

I can create a PR with what I have so people can see how complex it really is. If we still think it's too much for this problem, would it be enough for me to close this with a PR that enables a self-scaling flag that just programmatically registers a shutdown hook in the admission go code that just sleeps for 30 seconds?

This still feels very niche to me, when there are other Kubernetes mechanisms in place to protect workloads, such as the ones I've described above.

Additionally, dunning a single admission-controller pod is dangerous for other VPA workloads. If the node it was on were to be deleted (some bad event or regular kubernetes upgrades, perhaps), then the admission-controller wouldn't be around to server other Pod creation webhooks.

Encouraging users to run the admission-controller in a non-HA fashion seems like a bad idea to me.

@maxcao13
Copy link
Author

Fair enough, thanks for the insight.

Is this more of a documentation issue then? I think the fact that people are doing this is enough to have some sort of guide or best practices blurb for self-scaling the VPA.

@adrianmoisey
Copy link
Member

Yup! I 100% agree on that.
I don't think it's documented that you should have multiple admission-controllers running, and a single (active) updater and recommender.
Those are lessons that took me a while to learn.

@adrianmoisey
Copy link
Member

I'll make a PR to update the documentation
/assign

@Shubham82
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants