You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the alerting rules are just the default ones created by Sloth. Sloth creates one alert rule for the paging severity and one for the ticket severity. However in the paging severity there are two rules evaluated with a big OR in between. Here's an example:
(
max(slo:sli_error:ratio_rate5m{sloth_id="autometrics-success-rate-95", sloth_service="autometrics", sloth_slo="success-rate-95"} > (14.4 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate1h{sloth_id="autometrics-success-rate-95", sloth_service="autometrics", sloth_slo="success-rate-95"} > (14.4 * 0.05)) without (sloth_window)
)
or
(
max(slo:sli_error:ratio_rate30m{sloth_id="autometrics-success-rate-95", sloth_service="autometrics", sloth_slo="success-rate-95"} > (6 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate6h{sloth_id="autometrics-success-rate-95", sloth_service="autometrics", sloth_slo="success-rate-95"} > (6 * 0.05)) without (sloth_window)
)
When this alert triggers you cannot see if it's triggered because of the 1h/5m time window + burn rate rule, or the 6h/30m rule. This information also isn't included in any label or anything. It could be worth breaking this rule out into two different rules enabling the user to understand the time frame and burn rate involved in generating this alert. We could include this information in a label on the alert as well to allow it to be better understood and displayed in explorer.
The text was updated successfully, but these errors were encountered:
Currently, the alerting rules are just the default ones created by Sloth. Sloth creates one alert rule for the paging severity and one for the ticket severity. However in the paging severity there are two rules evaluated with a big
OR
in between. Here's an example:When this alert triggers you cannot see if it's triggered because of the 1h/5m time window + burn rate rule, or the 6h/30m rule. This information also isn't included in any label or anything. It could be worth breaking this rule out into two different rules enabling the user to understand the time frame and burn rate involved in generating this alert. We could include this information in a label on the alert as well to allow it to be better understood and displayed in explorer.
The text was updated successfully, but these errors were encountered: