-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use kubectl rollout restart
on single-replica Deployments with PDB
#928
Comments
Thanks for opening this issue. |
Yeah this would practically only help when you have single replica Deployments. Single replica statefulsets would not really benefit from this, and in that case a PDB makes even less sense as you can't configure a statefulset to have a "surge" of temporarily increased amount of pods. Maybe an edge case would be if you used the OpenKruise CloneSet with max surge, which is practically an extended Deployment but where you can make use of stuff like PVCs per pod. Which brings up a big downside of this approach: it's not very portable. Should kured only have a special case for Deployment resources? Is there a way to design it so it becomes agnostic? Should kured have some special cases implemented for CustomResources like the CloneSet from OpenKruise that I mentioned earlier? In my tunnel-visioned opinion, I think only having a special case for Deployment is fine :) Kind of because that's a solution that would help me. But as this seems to be a common thing, maybe that's fine? There are some Kubernetes proposals on improving this whole situation, such as this one: kubernetes/enhancements#4213 Which more favor into my argument that a "hardcoded fix"/"special behavior for deployments" might be sufficient for now. Wdyt? Are you up for the idea? And if so, would you accept a PR on it? |
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
Still relevant |
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
Still relevant |
Do you really want to grant kured a cluster wide access to patch deployments, statefulsets, and daemonsets, instead of switching to 2 inside the minimum amount of replicas? That does not sound like a good trajectory to me. We should aim for least privileges. I think however of two ways this could be achieved: It could be achieved with limited rights with the help of another operator. Alternatively, the code could be modified to have a different way to drain, introducing this feature at the expense of security. Overall, with the current state of Kured, I am not really fond of the approach -- all these changes and decreased security because someone with a microservice doesn't want to deploy 2 pods doesn't sound acceptable. If we had a mechanism to alter the ordering of the actions based on user input, that might be easier, but it's not part of the code as of today. |
If you have other ideas, I am all ears :) |
Yeah the idea doesn't sit completely well with me either. Especially since this is not a generic solution as it only applies to Deployments. This idea does not cover StatefulSets nor DaemonSets. It could be considered "too domain specific".
I feel the need to defend this a little. Especially so that it's understood what my motivation is. The use case is scaling down resources where we have many identical tenants/environments that we want to be as small as possible so that we can fit as many as possible on the same nodes. And scaling down CPU/memory resources vs scaling down replicas do not behave the same (unless you have uncommonly steady load with no spikes whatsoever). The argument I see often is "well if you only have 1 replica then you're already stating that you're OK with downtime" -well not exactly, because planned downtime and unplanned downtime are two very different things and happen at two very different intervals. I don't want downtime during planned downtime, but I'm OK with unplanned downtime. Another adjacent use case is when a (micro)service is not designed for multiple replicas. I.e when the job/task can't run in parallel. You then need to figure out how to do some kind of distributed lock via something like Redis, but then you add Redis to the stack and the idea was to scale down. Some concrete examples of this is a "garbage collector"/cleanup type of service (e.g "scan database for old records and edit or delete them"), or a service that scans something (e.g rows in a database table) to then trigger alerts/webhooks based on it (wouldn't want duplicate alerts just because you have 2 replicas). More generally it's when 2 replicas would results in either race conditions or duplicate work. But back on topic. Your idea of having a separate operator for this might fit well. Such as an operator that listens for admission webhooks for when a pod of a deployment is being deleted/evicted and then doing this "rollout restart" approach instead. Such a tool would not need to be related to kured at all, and could then more easily be installed on a select few namespaces. At our workplace we had a very small selection of services where we used single replica with PDB and where we felt we needed a solution for this. We were running "unattended upgrades + kured" on a monthly schedule, which meant that our nodes were restarted once a month and so this issue affected us often. However we've since then reconsidered the impact and either increased the replica count to 2 or removed the PDB (and allowing the brief monthly downtime) on a per-service basis. Some of the discussions we've had at our workplace of alternative solutions to scaling down is by making these services more "multi-tenant capable", where instead of having the same service duplicated in a lot of tenants/environments to instead only have it once. Then even if you do 2 (or even 3) replicas of that new multi-tenant service it's still less resources than 20 single-replica tenant/environment-specific (i.e non-multi-tenant, 1 per tenant) instances. From my end I'm OK with you closing this issue now as "Not planned". If the need comes back to us in the future, then we might consider writing the hypothetical operator we talked about here in this thread. But I wouldn't hold my breath. |
Goodday! First off, really nice tool ❤️ Works exactly as advertised.
We've hit an issue where we want to have deployments with
.spec.replicas: 1
, but still make use of PodDisruptionBudget (PDB). Most Kubernetes discussions and Kubernetes documentation basically summarizes this as a "bad idea", usually with the argument of "if you only have 1 replica, then you have already agreed to not having it HA".When kured evicts a pod (or when
kubectl drain
evicts one), it immediately terminates the existing pod before the replacement is up and running, leading to unwanted downtime/disruptions. Adding PDB is supposed to solve this, but when you only have 1 replica then the PDB will basically always report.status.disruptionsAllowed: 0
, leading to dead-locks in the whole node drain and reboot procedure.At our workplace we have the our microservice applied to two different use cases:
replicas: 2
or more, and then no issue with PDB and evictions.replicas: 1
to reduce resource consumption of having a lot of these smaller installations.This issue is only about use case 2 here.
Proposed solution: use
kubectl rollout restart
The
kubectl rollout restart
command basically just adds an annotation to the pod template in the Deployment or StatefulSet resource, forcing it to roll out a new version. In this article, they explain a way around this eviction block by making use of this rollout restart approach: https://www.artur-rodrigues.com/tech/2023/03/30/impossible-kubectl-drains.htmlWhat I envision that kured could do is:
.status.disruptionsAllowd: 0
, and is owned by a Deployment with.spec.replicas: 1
, then issue a rollout restart instead of evictionkured.dev/strategy: rollout-restart
, and is owned by a Deployment, then always use rollout restart approach on itFirst bullet point would be a "heuristic based approach" that should solve a lot of people's issues, without being too intrusive.
Using a rollout restart approach would restart all of the pods in the deployment, so it's not a given that the user wants to use this when replicas > 1, so for those cases the annotation provides an opt-in behavior.
The text was updated successfully, but these errors were encountered: