You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before Kubernetes 1.20, the field timeoutSeconds was not respected for exec probes: probes continued running indefinitely, even past their configured deadline, until a result was returned.
So if this callback was not intended/tested to be running under 1 second, agent may start being killed in case of heavy load or resource starvation as liveness probe will start failing:
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300}; STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900}; if [ ! -e /var/run/google-fluentd/buffers ]; then
exit 1;
fi; touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck; if [[ -z "$(find /var/run/google-fluentd/buffers -type f -newer /tmp/marker-stuck -print -quit)" ]]; then
rm -rf /var/run/google-fluentd/buffers;
exit 1;
fi; touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness; if [[ -z "$(find /var/run/google-fluentd/buffers -type f -newer /tmp/marker-liveness -print -quit)" ]]; then
exit 1;
fi;
failureThreshold: 3
initialDelaySeconds: 600
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 1
I recommend to bump the value to some big number after testing it
The text was updated successfully, but these errors were encountered:
I recommend a big number as the long probe execution likely indicates high IO latency issues or CPU starvation. Neither is the reason to kill the pod that is monitoring things and especially interesting to report data from the node that is under the heavy load.
In 1.20 the exec probe timeout will start being enforced:
So if this callback was not intended/tested to be running under 1 second, agent may start being killed in case of heavy load or resource starvation as liveness probe will start failing:
kubernetes-configs/logging-agent.yaml
Lines 46 to 64 in f01ceca
I recommend to bump the value to some big number after testing it
The text was updated successfully, but these errors were encountered: