From 5895f7d5a0506b03266704b82ecd551ff180925b Mon Sep 17 00:00:00 2001 From: Dejan Zele Pejchev Date: Thu, 11 Jan 2024 11:18:32 +0100 Subject: [PATCH] KEP-3939: update metrics and e2e test parts to reflect latest implementation details (#4316) * KEP-3939: update metrics and e2e test parts to reflect latest implementation details * KEP-3939: fix typo in e2e test section --- .../README.md | 43 ++++++++----------- 1 file changed, 19 insertions(+), 24 deletions(-) diff --git a/keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md b/keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md index a6d483cdf02..ef6085b3b5d 100644 --- a/keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md +++ b/keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md @@ -258,7 +258,7 @@ See [Jobs create replacement Pods as soon as a Pod is marked for deletion](https #### Story 2 As a cloud user, users would want to guarantee that the number of pods that are running is exactly the amount that they specify. -Terminating pods do not relinguish resources so scarce compute resource are still scheduled to those pods. +Terminating pods do not relinquish resources so scarce compute resource are still scheduled to those pods. Replacement pods do not produce unnecessary scale ups. #### Story 3 @@ -520,30 +520,24 @@ Tests will verify counting of terminating fields regardless of `PodDisruptionCon ##### e2e tests -Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`. +Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`. +Test should to create a Job which can catch a SIGTERM signal and allow for graceful termination, so when we delete the test +we can first assert that pods aren't created while the Pod is terminating and finally when it terminates that a new Pod is created. -An example job spec that can reproduce this issue is below: +We can use the default `busybox` image which is generally used in e2e tests and override the command field with something like: -```yaml -apiVersion: batch/v1 -kind: Job -metadata: - name: job-slow-cleanup-with-pod-recreate-feature -spec: - completions: 1 - parallelism: 1 - backoffLimit: 2 - podReplacementPolicy: Failed - template: - spec: - restartPolicy: Never - containers: - - name: sleep - image: gcr.io/k8s-staging-perf-tests/sleep - args: ["-termination-grace-period", "1m", "60s"] -``` +```shell +_term(){ + sleep 5 + exit 143 +} +trap _term SIGTERM +while true; do + sleep 1 +done +``` -A e2e test can verify that deletion will not trigger a new pod creation until the exiting pod is fully deleted. +An e2e test can verify that deletion will not trigger a new pod creation until the exiting pod is fully deleted. If `podReplacementPolicy: TerminatingOrFailed` is specified we would test that pod creation happens closely after deletion. @@ -905,8 +899,9 @@ During pod terminations, an operator can see that the terminating field is being We will use a new metric: -- `job_pods_creation_total` (new) the `action` label will mention what triggers creation (`new`, `recreateTerminatingOrFailed`, `recreateTerminated`)) -This can be used to get the number of pods that are being recreated due to `recreateTerminated`. Otherwise we would expect to see `new` or `recreateTerminatingOrFailed` as the normal values. +- `job_pods_creation_total` (new) the `reason` label will mention what triggers creation (`new`, `recreate_terminating_or_failed`, `recreate_failed`)) + and the `status` label will mention the status of the pod creation (`succeeded`, `failed`). +This can be used to get the number of pods that are being recreated due to `recreateTerminated`. Otherwise, we would expect to see `new` or `recreateTerminatingOrFailed` as the normal values.