Handle context canceled errors on shutdown #2843

romulets · 2024-12-17T10:29:37Z

Cloudbeat can any moment receive a context cancel. Right now, once a context cancel happens, we log errors from multiple different places of cloudbeat.

That is specially troublesome in agentless where pods tend be restarted/deleted with more frequency than a standard agent based solution. On top of that, in agentless we are paged based on amount of errors, and a cloudbeat shutdown during a cycle might alert the engineer on duty (urgency low, example).

The error logging is spread through the code, and we can't just unifying all errors and raise them up because some of them are "optional" errors (we log them but doesn't stop the execution). Example.

Ideally we find a strategy to not have any alert in such a scenario, because the context canceled on a shutdown is something that a oncaller has nothing to act upon, therefore is a false positive.

There are two directions we could see us going with:

From cloudbeat, we could write a wrapper or handler around logp to receive the error and check, if context canceled lower the level to warn (or whatever else we decide). Or we could case per case, what would be very repetitive.
Don't alert in case of pods shutdown or restart. That might be tricky to configure and might hide a legit issue. But the fact is that once a pod is shut down there is nothing a oncaller can do. There is no customer impact. There is nothing to fix - the pod is gone. So should we alert on non actionable problems?

The text was updated successfully, but these errors were encountered:

romulets added Feature:Cloud-Security Cloud Security related features Team:Cloud Security Cloud Security team related labels Dec 17, 2024

romulets changed the title ~~Handle context canceled errors~~ Handle context canceled errors on shutdown Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle context canceled errors on shutdown #2843

Handle context canceled errors on shutdown #2843

romulets commented Dec 17, 2024

Handle context canceled errors on shutdown #2843

Handle context canceled errors on shutdown #2843

Comments

romulets commented Dec 17, 2024