Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow opting-into upstream probes #2505

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rexagod
Copy link

@rexagod rexagod commented Sep 8, 2024

Description

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.

Allow users to opt-into upstream probe definitions.

Type of change

What type of changes does your code introduce to the kube-prometheus? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Please put a one-line changelog entry below. Later this will be copied to the changelog file.

Users can opt-into the upstreamProbesOptIn (boolean, false by default) for KSM to sync with upstream's health probe definitions.

@rexagod rexagod requested a review from a team as a code owner September 8, 2024 21:52
@rexagod rexagod force-pushed the upstream-probes-opt-in branch 3 times, most recently from ed3886f to 5b9a879 Compare September 8, 2024 23:00
@rexagod
Copy link
Author

rexagod commented Sep 8, 2024

Tested downstream (CMO), on enabling upstreamProbesOptIn.

diff --git a/assets/kube-state-metrics/deployment.yaml b/assets/kube-state-metrics/deployment.yaml
index ec651432e..e73091269 100644
--- a/assets/kube-state-metrics/deployment.yaml
+++ b/assets/kube-state-metrics/deployment.yaml
@@ -57,7 +57,19 @@ spec:
           ^kube_pod_completion_time$,
           ^kube_pod_status_scheduled$
         image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+        livenessProbe:
+          httpGet:
+            path: /livez
+            port: 8080
+          initialDelaySeconds: 5
+          timeoutSeconds: 5
         name: kube-state-metrics
+        readinessProbe:
+          httpGet:
+            path: /livez
+            port: 8080
+          initialDelaySeconds: 5
+          timeoutSeconds: 5
         resources:
           requests:
             cpu: 2m
diff --git a/jsonnet/main.jsonnet b/jsonnet/main.jsonnet
index 7539fb456..ffabfb4f0 100644
--- a/jsonnet/main.jsonnet
+++ b/jsonnet/main.jsonnet
@@ -216,6 +216,7 @@ local inCluster =
         kubeRbacProxyImage: $.values.common.images.kubeRbacProxy,
         commonLabels+: $.values.common.commonLabels,
         mixin+: { ruleLabels: $.values.common.ruleLabels },
+        upstreamProbesOptIn: true,
       },
       nodeExporter: {
         namespace: $.values.common.namespace,
@@ -439,7 +440,7 @@ local inCluster =
     openshiftStateMetrics: openshiftStateMetrics($.values.openshiftStateMetrics),
   } +
   (import './utils/anti-affinity.libsonnet') +
-  (import 'github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus/addons/ksm-lite.libsonnet') +
+  (import 'github.com/rexagod/kube-prometheus/jsonnet/kube-prometheus/addons/ksm-lite.libsonnet') +
   (import './utils/ibm-cloud-managed-profile.libsonnet') +
   (import './components/metrics-server-audit.libsonnet') +
   {};  // Including empty object to simplify adding and removing imports during development

Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also adjust the kube-rbac-proxy args to include the proper URLs in --ignore-paths?

Comment on lines 171 to 178
} + if ksm._config.upstreamProbesOptIn then {
livenessProbe: super.livenessProbe,
readinessProbe: super.livenessProbe,
} else {
livenessProbe:: null,
readinessProbe:: null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} + if ksm._config.upstreamProbesOptIn then {
livenessProbe: super.livenessProbe,
readinessProbe: super.livenessProbe,
} else {
livenessProbe:: null,
readinessProbe:: null,
} + if !ksm._config.upstreamProbesOptIn then {
livenessProbe:: null,
readinessProbe:: null,
} else {

@rexagod rexagod force-pushed the upstream-probes-opt-in branch 2 times, most recently from 5b9a879 to 99328cc Compare September 10, 2024 08:41
@rexagod rexagod force-pushed the upstream-probes-opt-in branch 4 times, most recently from 39059e6 to fd44004 Compare September 11, 2024 06:45
@rexagod
Copy link
Author

rexagod commented Sep 11, 2024

  • Add a comment describing the way probes are defined and expected to work.

'--tls-cipher-suites=' + std.join(',', krp._config.tlsCipherSuites),
'--upstream=' + krp._config.upstream,
] // Optionals.
+ if std.length(krp._config.ignorePaths) != 0 then ['--ignore-paths=' + std.join(',', krp._config.ignorePaths)] else defaults.ignorePaths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ if std.length(krp._config.ignorePaths) != 0 then ['--ignore-paths=' + std.join(',', krp._config.ignorePaths)] else defaults.ignorePaths,
+ if std.length(krp._config.ignorePaths) != 0 then ['--ignore-paths=' + std.join(',', krp._config.ignorePaths)] else defaults.ignorePaths,
Suggested change
+ if std.length(krp._config.ignorePaths) != 0 then ['--ignore-paths=' + std.join(',', krp._config.ignorePaths)] else defaults.ignorePaths,
+ if std.length(krp._config.ignorePaths) > 0 then ['--ignore-paths=' + std.join(',', krp._config.ignorePaths)] else [],

@@ -122,6 +124,7 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
{ name: 'https-self', containerPort: 9443 },
],
image: ksm._config.kubeRbacProxyImage,
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/readyz'] else super.ignorePaths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/readyz'] else super.ignorePaths,
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/readyz'] else super.ignorePaths,
Suggested change
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/readyz'] else super.ignorePaths,
// When enabling probes, kube-rbac-proxy needs to always allow the /readyz endpoint.
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/readyz'] else [],

@@ -112,6 +113,7 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
{ name: 'https-main', containerPort: 8443 },
],
image: ksm._config.kubeRbacProxyImage,
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/livez'] else super.ignorePaths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/livez'] else super.ignorePaths,
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/livez'] else super.ignorePaths,
Suggested change
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/livez'] else super.ignorePaths,
// When enabling probes, kube-rbac-proxy needs to always allow the /livez endpoint.
ignorePaths: if ksm._config.upstreamProbesOptIn then ['/livez'] else [],

@@ -46,6 +46,7 @@ local defaults = {
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/%s',
},
},
upstreamProbesOptIn:: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment explaining the purpose + implementation would be useful here.

Suggested change
upstreamProbesOptIn:: false,
enableProbes:: false,

Comment on lines 173 to 195
} + if !ksm._config.upstreamProbesOptIn then {
livenessProbe:: null,
readinessProbe:: null,
} else {
livenessProbe: super.livenessProbe,
readinessProbe: super.readinessProbe,
}, super.containers) + [kubeRbacProxyMain, kubeRbacProxySelf],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} + if !ksm._config.upstreamProbesOptIn then {
livenessProbe:: null,
readinessProbe:: null,
} else {
livenessProbe: super.livenessProbe,
readinessProbe: super.readinessProbe,
}, super.containers) + [kubeRbacProxyMain, kubeRbacProxySelf],
} + if !ksm._config.upstreamProbesOptIn then {
livenessProbe:: null,
readinessProbe:: null,
} else {}, super.containers) + [kubeRbacProxyMain, kubeRbacProxySelf],

@rexagod
Copy link
Author

rexagod commented Sep 11, 2024

@simonpasquier Pushed all the suggested changes but s/defaults.ignorePaths/[] and s/super.ignorePaths/[] since I believe these will always be [], in addition to the current implementation allowing for a single source of truth for the default value?

@simonpasquier
Copy link
Contributor

I generated the manifest with enableProbes: true but Kubernetes isn't happy with it:

$ kc events
LAST SEEN                TYPE      REASON      OBJECT                                    MESSAGE
6m20s (x2355 over 21h)   Warning   Unhealthy   Pod/kube-state-metrics-68fccdc5f7-9wzdn   Liveness probe errored: strconv.Atoi: parsing "http-metrics": invalid syntax
80s (x2703 over 21h)     Warning   Unhealthy   Pod/kube-state-metrics-68fccdc5f7-9wzdn   Readiness probe errored: strconv.Atoi: parsing "telemetry": invalid syntax

@rexagod rexagod force-pushed the upstream-probes-opt-in branch 3 times, most recently from be93e60 to 54a4817 Compare September 15, 2024 23:39
@rexagod
Copy link
Author

rexagod commented Sep 16, 2024

Tested on a cluster, seems to work now. I've made KRP's ports the single source of truth for KSM's probes and ports for convenience. PTAL below for downstream diff.

diff --git a/assets/kube-state-metrics/deployment.yaml b/assets/kube-state-metrics/deployment.yaml
index ec651432e..3b444f081 100644
--- a/assets/kube-state-metrics/deployment.yaml
+++ b/assets/kube-state-metrics/deployment.yaml
@@ -57,7 +57,22 @@ spec:
           ^kube_pod_completion_time$,
           ^kube_pod_status_scheduled$
         image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+        livenessProbe:
+          httpGet:
+            path: /livez
+            port: http-metrics
+            scheme: HTTPS
         name: kube-state-metrics
+        ports:
+        - containerPort: 8443
+          name: http-metrics
+        - containerPort: 9443
+          name: telemetry
+        readinessProbe:
+          httpGet:
+            path: /readyz
+            port: telemetry
+            scheme: HTTPS
         resources:
           requests:
             cpu: 2m
@@ -75,6 +90,7 @@ spec:
         - --secure-listen-address=:8443
         - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
         - --upstream=http://127.0.0.1:8081/
+        - --ignore-paths=/livez
         - --tls-cert-file=/etc/tls/private/tls.crt
         - --tls-private-key-file=/etc/tls/private/tls.key
         - --client-ca-file=/etc/tls/client/client-ca.crt
@@ -83,7 +99,7 @@ spec:
         name: kube-rbac-proxy-main
         ports:
         - containerPort: 8443
-          name: https-main
+          name: http-metrics
         resources:
           requests:
             cpu: 1m
@@ -104,6 +120,7 @@ spec:
         - --secure-listen-address=:9443
         - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
         - --upstream=http://127.0.0.1:8082/
+        - --ignore-paths=/readyz
         - --tls-cert-file=/etc/tls/private/tls.crt
         - --tls-private-key-file=/etc/tls/private/tls.key
         - --client-ca-file=/etc/tls/client/client-ca.crt
@@ -112,7 +129,7 @@ spec:
         name: kube-rbac-proxy-self
         ports:
         - containerPort: 9443
-          name: https-self
+          name: telemetry
         resources:
           requests:
             cpu: 1m

@rexagod
Copy link
Author

rexagod commented Oct 16, 2024

Needs an approval (for workflows).

@simonpasquier
Copy link
Contributor

Allow users to opt-into upstream probe definitions.

Signed-off-by: Pranshu Srivastava <[email protected]>
@rexagod rexagod force-pushed the upstream-probes-opt-in branch 5 times, most recently from a8d3ffc to 83e0cdb Compare October 21, 2024 14:00
@rexagod
Copy link
Author

rexagod commented Oct 21, 2024

Fixed, needs workflow approval.

@rexagod
Copy link
Author

rexagod commented Nov 5, 2024

Re-ping, @simonpasquier for another look here. 🙂

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see warnings when applying the manifests with enableProbes: true.

Warning: spec.template.spec.containers[1].ports[0]: duplicate port definition with spec.template.spec.containers[0].ports[0]
Warning: spec.template.spec.containers[2].ports[0]: duplicate port definition with spec.template.spec.containers[0].ports[1]
```

Also it still feels strange that the probes being defined on the ksm container while the kubelet queries the kube-rbac-proxy containers. I understand that for the liveness probe, it's required so we restart the ksm container if it gets stuck but could we at least define the readiness probe in the ksm container?

image: ksm._config.kubeRbacProxyImage,
// When enabling probes, kube-rbac-proxy needs to always allow the /livez endpoint.
ignorePaths: if ksm._config.enableProbes then ['/livez'] else super.ignorePaths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) it's probably more idiomatic jsonnet.

Suggested change
ignorePaths: if ksm._config.enableProbes then ['/livez'] else super.ignorePaths,
ignorePaths+: if ksm._config.enableProbes then ['/livez'] else [],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants