healthcheck: fail on supervisorctl errors #317
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: Please remember to review the Datadog Contribution Guidelines
if you have not yet done so.
What does this PR do?
Adds an extra check in probe.sh to first check if
supervisorctl status
exits with a 0. Ifstatus
can not run, probe will fail before we should try to parse its contents.Motivation
We have a scenario where the collector fails and I was expecting the health check to fail and recycle the task.
Upon checking, I found that supervisorctl encounters an exception and the egrep check does not handle this case.
Exit code: 0
A scheduler that checks for the exit code of the probe will not catch this.
After adding the check for the supervisorctl exit code:
Exit code: 1
A simple first check of
supervisorctl status
executed first to ensure it exits with a 0 solves this. Any exception or execution that can not even list the status should marked the container as failed.Testing Guidelines
N/A - happy to be guided and add something if the probe is covered anywhere as a test scenario
Additional Notes
Can have an implication for this issue: #314
In our environment even with the extra check, it completes in under 1s. Naturally, this will depend on how many resources are allocated to the container.