-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration of the MutatingWebhook failure policy #2711
Comments
Thanks for requesting this feature. /kind good-first-issue |
@M00nF1sh: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
This will be also useful when the TargetGroupBinding admission webhook fails.
|
It looks like this configuration option would be needed in the event of an availability zone failure when running a multi-az EKS cluster. We ran into a similar issue when simulating a network AZ outage in our environment. We were surprised to see that even when all the nodes were failed over to healthy environments no new pods could start. Investigating further we saw errors about loadbalancer controller related mutating webhook. For some reason it stops working during AZ failure.
All the pods in namespaces with the PodReadinessGates enabled were stuck and replica set controller was not able to create new pods. To work around it - we now need a human intervention and a procedure in place, where we disable the PodReadinessGates in the event of AZ failure to recover the cluster. @M00nF1sh Maybe you could confirm if this feature will help in our scenario or we should open a new issue for this? |
/assign |
Closing as it appears this was addressed in #3653 |
/close |
@josh-ferrell: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@mikutas: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is about pod mutating webhook |
Describe the bug
I ran into issues with TLS certs being regenerated due to these bugs:
#2312
#2264
Once the TLS certs changed, the MutatingWebhook for PodReadinessGate started failing and blocking the rollout of pods on services using this feature.
This was the error:
I think this exposes an availability concern because if all the pods backing a service get rescheduled while the mutatingwebhook is broken, the service will go down. My understanding is the PodReadinessGate is a bonus feature to make rollouts more smooth in Kubernetes and I think it'd be preferable for the feature to just not work rather than block rollouts all together.
Steps to reproduce
Break TLS certs on the LB controller while using PodReadinessGates, then reschedule pods backing an LB in that namespace.
Expected outcome
I'd like to either be able to configure the webhooks failure policy or set it to fail open.
Environment
Additional Context:
The text was updated successfully, but these errors were encountered: