Skip to content

Commit

Permalink
introduce maxClientCancellationRatio
Browse files Browse the repository at this point in the history
  • Loading branch information
tn819 committed Jul 24, 2024
1 parent 0df2cdf commit c841478
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
4 changes: 3 additions & 1 deletion charts/generic-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,9 @@ app:
| `alerting.http.sampleInterval` | `20m` | The time interval in which to measure HTTP responses for triggering alerts |
| `alerting.http.referenceInterval` | `1w` | The time interval to compare with the sample interval to detect changes |
| `alerting.http.maxSlowdown` | `2.5` | The maximum HTTP response slowdown in the sample interval compared to the reference interval |
| `alerting.http.max4xxRatio` | `2.5` | The maximum HTTP 4xx ratio increase in the sample interval compared to the reference interval |
| `alerting.http.max4xxRatio` | `2.5` | The maximum HTTP 4xx ratio increase (except 499) in the sample interval compared to the reference interval |
| `alerting.http.maxClientCancellationRatio` | `0` | The maximum client cancellation (HTTP 499) ratio increase in the sample interval compared to the reference interval |
| `alerting.http.maxTimeoutCount` | `0` | The maximum number of HTTP gateway timeout responses (504) in the sample interval
| `alerting.http.max5xxCount` | `0` | The maximum number of HTTP 5xx responses (except 504) in the sample interval |
| `alerting.http.maxTimeoutCount` | `0` | The maximum number of HTTP gateway timeout responses (504) in the sample interval |
| `alerting.grpc.requestsMetric` | `grpc_server_handled_total` | The name of the Prometheus metric counting gRPC requests |
Expand Down
2 changes: 1 addition & 1 deletion charts/generic-service/templates/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ spec:
- alert: HttpClientCancelled
expr: |
(sum(rate({{ include "generic-service.request-code-count-metric" . }}"499"[{{ .Values.alerting.http.sampleInterval }}]))) /
(sum(rate({{ include "generic-service.request-code-count-metric" . }}"499"[{{ .Values.alerting.http.referenceInterval }}])))) > {{ .Values.alerting.http.max4xxRatio }}
(sum(rate({{ include "generic-service.request-code-count-metric" . }}"499"[{{ .Values.alerting.http.referenceInterval }}])))) > {{ .Values.alerting.http.maxClientCancellationRatio }}
labels: {{- include "generic-service.alert-labels" . | nindent 12 }} warning
annotations: {{- include "generic-service.alert-annotations" . | nindent 12 }} higher HTTP client cancellation rate
description: '{{ include "generic-service.fullname" . }} gave a {{"{{ $value }}"}}x higher percentage of HTTP request cancelled by the client in the last {{ .Values.alerting.http.sampleInterval }} than in the last {{ .Values.alerting.http.referenceInterval }}.'
Expand Down
5 changes: 5 additions & 0 deletions charts/generic-service/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -886,6 +886,11 @@
"default": 2.5,
"description": "The maximum HTTP 4xx ratio increase in the sample interval compared to the reference interval"
},
"maxClientCancellationRatio": {
"type": "number",
"default": 2.5,
"description": "The maximum client cancellation (HTTP 499) ratio increase in the sample interval compared to the reference interval"
},
"max5xxCount": {
"type": "number",
"default": 0,
Expand Down

0 comments on commit c841478

Please sign in to comment.