Skip to content

How should SLO alerts give details about the failing functions? #51

Closed Answered by IvanMerrill
gagbo asked this question in Q&A
Discussion options

You must be logged in to vote

I agree with @emschwartz.

Going more granular seems like an SLO anti-pattern where you end up with an SLO for just about everything. This feels similar to traditional alerting models which are what SLOs are trying to move away from.

Top offenders doesn't actually necessarily show you what you want to know, which is what has changed the most to cause this SLO breach. It's possible to have a function that has a high error rate (maybe due to a high number of external dependencies) that is factored into the SLO. If your SLO breaches and you get an alert you want to know what has changed behaviour in terms of errors, not what has errored the most. This is a subtle difference but can easily lea…

Replies: 0 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by emschwartz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants