Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vpa-admission-controller: Wire contexts #6891

Open
ialidzhikov opened this issue Jun 4, 2024 · 1 comment · May be fixed by #6899
Open

vpa-admission-controller: Wire contexts #6891

ialidzhikov opened this issue Jun 4, 2024 · 1 comment · May be fixed by #6899
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@ialidzhikov
Copy link
Contributor

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.1.2

What k8s version are you using (kubectl version)?:

v1.29

What did you expect to happen?:
Right now, the requests the vpa-admission-controller handles are not contextified. For example, for the handler for Pods, context.TODO() is used in few places where the admission-controller is making requests along the way:

Due to the usages of context.TODO(), when the caller (kube-apiserver) cancels the request (due to client side timeout), the admission-controller's Pod handler is not notified about this and continues to process the requests even when the request is cancelled client side.

We recently faced a VPA related outage (described in #6884) where the vpa-admission-controller was client-side throttled due to the low default kube-api-qps/burst settings.

From the logs we see that it was throttled > 50 minutes:

{"log":"Waited for 51m21.05416376s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m21.024486679s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m20.527328217s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m19.975656855s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m19.466347921s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m18.572692764s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}

Hence, the vpa-admission-controller currently would wait the client-side throttling (> 50min) instead of canceling the request.

Meanwhile the kube-apiserver cancelled the request after the configured timeout in the webhook (10s in our case):

E0604 13:21:25.379831       1 dispatcher.go:214] failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook:443/?timeout=10s": context deadline exceeded

What happened instead?:

See above.

How to reproduce it (as minimally and precisely as possible):

Add a big sleep (higher than the kube-apiserver's timeout) in the Pod handler and make sure that the admission request continues to do things after kube-apiserver cancelled the request client-side.

Anything else we need to know?:

N/A

@ialidzhikov ialidzhikov added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2024
@Shubham82
Copy link
Contributor

/area vertical-pod-autoscaler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants