-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fdbmonitor restart-delay unclear documentation #11775
Comments
CC @spraza |
I don't have a strong opinion on either but I think using the exit code and only delaying processes that have an exit code different than 0 should be delayed. My understanding of this setting is to prevent a process to crash loop directly in a short amount of time. |
For completeness: In the fdb-kubernetes-monitor we are only delaying the restart when the process exit code was not |
Looked at the code. You're right, currently the delay calculation doesn't account for exit code. I think it's reasonable to only have delays for failing (non zero) exited processes. Testing the fix would probably take more time than the fix itself. I can send the PR out but it may take a few days before I get to it. Happy to review the PR if one of you is interested to fix this (@johscheuer, @mpatou-openai). |
fdbmonitor has a field restart-delay. Based on the documentation:
The maximum number of seconds (subject to jitter) that fdbmonitor will delay before restarting a failed process.
. I would assume thatfailed process
means a process that exits with an exit code other than 0. But the actual implementation doesn't look at the exit code: https://github.com/apple/foundationdb/blob/main/fdbmonitor/fdbmonitor.h#L395-L405 + https://github.com/apple/foundationdb/blob/main/fdbmonitor/fdbmonitor.cpp#L646. So we either have to correct the documentation or fix the behaviour in fdbmonitor.I saw this when I looked into #11764.
The text was updated successfully, but these errors were encountered: