-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the Prometheus rule alerts acc to specified value & correct the default values in API desc #2816
Update the Prometheus rule alerts acc to specified value & correct the default values in API desc #2816
Conversation
67ee251
to
f418b0f
Compare
/hold |
f418b0f
to
501667d
Compare
fb64ae6
to
d281e59
Compare
/cherry-pick release-4.17 |
@malayparida2000: once the present PR merges, I will cherry-pick it on top of release-4.17 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
PR is ready for review, removing hold |
/cc @travisn |
@malayparida2000: GitHub didn't allow me to request PR reviews from the following users: aruniiird. Note that only red-hat-storage members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
c992aec
to
06b3662
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold for Travis and Arun review
Full testing of the feature on this PR is documented here https://hackmd.io/@Yh4a4hAATcW2BNYBJVSx4w/SJaY57gCR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold for Travis and Arun review
While adding the fields the description of the fields were directly lifted off from rook-operator. The values of NearFull, BackfillFull & Full in rook are 0.85, 0.90 & 0.95 respectively. But in OCS Operator we set these values to 0.75, 0.80 & 0.85 respectively with the help of the rook-config-override ConfigMap. So the description of the fields in the API should reflect the actual values that are set in OCS. Signed-off-by: Malay Kumar Parida <[email protected]>
06b3662
to
aa3a230
Compare
Earlier the cluster utilization alert rules (CephClusterNearFull, CephClusterCriticallyFull, CephClusterReadOnly) and the osd alert rules (CephOSDNearFull, CephOSDCriticallyFull) were hardcoded to use the nearFullRatio 0.75, criticallyFullRatio 0.80, and fullRatio 0.85 values. But these values are now configurable on the storageCluster CR. So the prometheus rules for these alerts will now be updated to use the specified values if provided in the storageCluster CR. This also includes the refactor of the changing the prometheus rule process. The function is now easier to read, maintain & expand. Also add tests for prometheus rule changing process. Signed-off-by: Malay Kumar Parida <[email protected]>
aa3a230
to
43edf9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: iamniting, malayparida2000, travisn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
c90b932
into
red-hat-storage:main
@malayparida2000: new pull request created: #2820 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an extensive redesign of replaceTokens
, please take a look.
@@ -1180,38 +1203,64 @@ func applyLabels(labels map[string]string, t *metav1.ObjectMeta) { | |||
} | |||
} | |||
|
|||
type exprReplaceToken struct { | |||
type replaceToken struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change (or elaborate) the replaceToken
struct? This will make our code much more cleaner
type replaceToken struct { | |
type replaceToken struct { | |
groupName string | |
recordOrAlertName string | |
wordToReplace string | |
replaceWith string | |
// wordInSection represents | |
// which field in the rule this change has to go | |
wordInSection RuleSection | |
// sectionKey is the map's key, if the above RuleSection is a map (like Annotations or Labels) | |
sectionKey string | |
} | |
// RuleSection determines which section in the Rule struct the change has to go in | |
type RuleSection int | |
const ( | |
ExprRuleSection RuleSection = iota | |
AnnotationRuleSection | |
LabelRuleSection | |
) |
replaceWith string | ||
} | ||
|
||
func changePromRuleExpr(promRules *monitoringv1.PrometheusRule, replaceTokens []exprReplaceToken) { | ||
if promRules == nil { | ||
func createReplaceToken(groupName, recordOrAlertName, wordToReplace, replaceWith string) replaceToken { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change createReplaceToken
function (if we add the RuleSection
part)
func createReplaceToken(groupName, recordOrAlertName, wordToReplace, replaceWith string) replaceToken { | |
func createReplaceToken(groupName, recordOrAlertName, wordToReplace, replaceWith string, wordInSection RuleSection, sectionKey string) replaceToken { |
@@ -1148,16 +1148,39 @@ func createPrometheusRules(r *StorageClusterReconciler, sc *ocsv1.StorageCluster | |||
return err | |||
} | |||
applyLabels(getCephClusterMonitoringLabels(*sc), &prometheusRule.ObjectMeta) | |||
replaceTokens := []exprReplaceToken{ | |||
|
|||
replaceTokens := []replaceToken{ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be consistent if we call createReplaceToken
function here as well
if specifiedNearFullRatio != nil { | ||
replaceTokens = append(replaceTokens, | ||
createReplaceToken("", "", "75%", fmt.Sprintf("%.2f%%", *specifiedNearFullRatio*100)), | ||
createReplaceToken("", "", "0.75", fmt.Sprintf("%f", *specifiedNearFullRatio))) | ||
} | ||
if specifiedBackfillFullRatio != nil { | ||
replaceTokens = append(replaceTokens, | ||
createReplaceToken("", "", "80%", fmt.Sprintf("%.2f%%", *specifiedBackfillFullRatio*100)), | ||
createReplaceToken("", "", "0.80", fmt.Sprintf("%f", *specifiedBackfillFullRatio))) | ||
} | ||
if specifiedFullRatio != nil { | ||
replaceTokens = append(replaceTokens, | ||
createReplaceToken("", "", "85%", fmt.Sprintf("%.2f%%", *specifiedFullRatio*100)), | ||
createReplaceToken("", "", "0.85", fmt.Sprintf("%f", *specifiedFullRatio))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't require a condition check and then append because we will check whether the given word and the replacing/new word are same or not in changePromRule
function. So directly append the values, with a slight modification in the fmt.Sprintf
format,
PS: don't be confused with the additional parameters like RuleSection and DescriptionKey part. Explained it in the below comments, please check
if specifiedNearFullRatio != nil { | |
replaceTokens = append(replaceTokens, | |
createReplaceToken("", "", "75%", fmt.Sprintf("%.2f%%", *specifiedNearFullRatio*100)), | |
createReplaceToken("", "", "0.75", fmt.Sprintf("%f", *specifiedNearFullRatio))) | |
} | |
if specifiedBackfillFullRatio != nil { | |
replaceTokens = append(replaceTokens, | |
createReplaceToken("", "", "80%", fmt.Sprintf("%.2f%%", *specifiedBackfillFullRatio*100)), | |
createReplaceToken("", "", "0.80", fmt.Sprintf("%f", *specifiedBackfillFullRatio))) | |
} | |
if specifiedFullRatio != nil { | |
replaceTokens = append(replaceTokens, | |
createReplaceToken("", "", "85%", fmt.Sprintf("%.2f%%", *specifiedFullRatio*100)), | |
createReplaceToken("", "", "0.85", fmt.Sprintf("%f", *specifiedFullRatio))) | |
} | |
const DescriptionKey = "description" | |
replaceTokens = append(replaceTokens, | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterNearFull", "75%", fmt.Sprintf("%0.0f%%", nearFullRatio*100), AnnotationRuleSection, DescriptionKey), | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterNearFull", "0.75", fmt.Sprintf("%0.2f", nearFullRatio), ExprRuleSection, ""), | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterCriticallyFull", "80%", fmt.Sprintf("%0.0f%%", criticallyFullRatio*100), AnnotationRuleSection, DescriptionKey), | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterCriticallyFull", "0.80", fmt.Sprintf("%0.0f", criticallyFullRatio), ExprRuleSection, ""), | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterReadOnly", "85%", fmt.Sprintf("%0.0f%%", fullRatio*100), AnnotationRuleSection, DescriptionKey), | |
createReplaceToken("cluster-utilization-alert.rules", "CephClusterReadOnly", "0.85", fmt.Sprintf("%0.2f", fullRatio), ExprRuleSection, ""), | |
) |
for ruleIdx := range group.Rules { | ||
rule := &group.Rules[ruleIdx] | ||
// If recordOrAlertName is specified, ensure it matches; otherwise, apply to all rules | ||
if token.recordOrAlertName == "" || rule.Record == token.recordOrAlertName || rule.Alert == token.recordOrAlertName { | ||
// Update the annotations in the rule | ||
if rule.Annotations != nil { | ||
// Update description if it exists | ||
if description, exists := rule.Annotations["description"]; exists { | ||
newDescription := strings.Replace(description, token.wordToReplace, token.replaceWith, -1) | ||
rule.Annotations["description"] = newDescription | ||
} | ||
} | ||
// Update the expression field in the rule | ||
exprStr := rule.Expr.String() | ||
if exprStr != "" { | ||
newExpr := strings.Replace(exprStr, token.wordToReplace, token.replaceWith, -1) | ||
rule.Expr = intstr.Parse(newExpr) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we apply the RuleSection
(and other needed changes), changePromRule()
function can be cleaned out as follows,
for ruleIdx := range group.Rules { | |
rule := &group.Rules[ruleIdx] | |
// If recordOrAlertName is specified, ensure it matches; otherwise, apply to all rules | |
if token.recordOrAlertName == "" || rule.Record == token.recordOrAlertName || rule.Alert == token.recordOrAlertName { | |
// Update the annotations in the rule | |
if rule.Annotations != nil { | |
// Update description if it exists | |
if description, exists := rule.Annotations["description"]; exists { | |
newDescription := strings.Replace(description, token.wordToReplace, token.replaceWith, -1) | |
rule.Annotations["description"] = newDescription | |
} | |
} | |
// Update the expression field in the rule | |
exprStr := rule.Expr.String() | |
if exprStr != "" { | |
newExpr := strings.Replace(exprStr, token.wordToReplace, token.replaceWith, -1) | |
rule.Expr = intstr.Parse(newExpr) | |
} | |
} | |
for ruleIdx, rule := range group.Rules { | |
// If recordOrAlertName is specified, ensure it matches; otherwise, apply to all rules | |
if token.recordOrAlertName == "" || rule.Record == token.recordOrAlertName || rule.Alert == token.recordOrAlertName { | |
switch token.wordInSection { | |
case ExprRuleSection: | |
// Update the expression field in the rule | |
newExpr := strings.ReplaceAll(rule.Expr.String(), token.wordToReplace, token.replaceWith) | |
promRule.Spec.Groups[groupIdx].Rules[ruleIdx].Expr = intstr.Parse(newExpr) | |
case AnnotationRuleSection: | |
if rule.Annotations == nil { // if no annotations, continue | |
continue | |
} | |
annotMap := rule.Annotations | |
annotMap[token.sectionKey] = strings.ReplaceAll( | |
annotMap[token.sectionKey], token.wordToReplace, token.replaceWith) | |
case LabelRuleSection: | |
if rule.Labels == nil { // if there are no labels, continue | |
continue | |
} | |
labelMap := rule.Labels | |
labelMap[token.sectionKey] = strings.ReplaceAll( | |
labelMap[token.sectionKey], token.wordToReplace, token.replaceWith) | |
} | |
} | |
} |
var changeTokens = []exprReplaceToken{ | ||
{recordOrAlertName: "CephMgrIsAbsent", wordInExpr: "openshift-storage", replaceWith: "new-namespace"}, | ||
var changeTokens = []replaceToken{ | ||
{recordOrAlertName: "CephMgrIsAbsent", wordToReplace: "openshift-storage", replaceWith: "new-namespace"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use the createReplaceToken()
function, rather than creating raw structs
@@ -1041,29 +1041,40 @@ func TestParsePrometheusRules(t *testing.T) { | |||
} | |||
|
|||
func TestChangePrometheusExprFunc(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the test function name as well, something like TestChangePrometheusRulesFunc
?
assert.Assert(t, | ||
strings.Contains(rule.Expr.String(), changeStr) || | ||
(rule.Annotations != nil && strings.Contains(rule.Annotations["description"], changeStr)), | ||
fmt.Sprintf("Expected '%s' to be found in either Expr or Annotations for alert %s", changeStr, alertName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed changes to the test function,
assert.Assert(t, | |
strings.Contains(rule.Expr.String(), changeStr) || | |
(rule.Annotations != nil && strings.Contains(rule.Annotations["description"], changeStr)), | |
fmt.Sprintf("Expected '%s' to be found in either Expr or Annotations for alert %s", changeStr, alertName)) | |
// below is a small hack to determine whether we have to check 'Annotation' or 'Expr' | |
// if it contains a %-age sign, then check the annotation description | |
if strings.Contains(changeStr, "%") { | |
assert.Assert(t, | |
strings.Contains(rule.Annotations[DescriptionKey], changeStr), | |
"Annotation description doesn't mach", "Received Annotation Description", | |
rule.Annotations[DescriptionKey], "Doesn't contain", changeStr) | |
} else { // check the expression | |
assert.Assert(t, | |
strings.Contains(rule.Expr.String(), changeStr), "Expression doesn't mach", | |
"Received Expression", rule.Expr, "Doesn't have", changeStr) | |
} |
@aruniiird: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Acc to discussion, @aruniiird would raise a follow-up to refactor PR to clean up the flow a bit more, but that would be only on the main & will not be backported. |
Update prometheusRule to use the fullRatio values from storageCluster CR
Earlier the cluster utilization alert rules (CephClusterNearFull,
CephClusterCriticallyFull, CephClusterReadOnly) and the osd alert rules
(CephOSDNearFull, CephOSDCriticallyFull) were hardcoded to use the
nearFullRatio 0.75, criticallyFullRatio 0.80, and fullRatio 0.85 values.
But these values are now configurable on the storageCluster CR. So the
prometheus rules for these alerts will now be updated to use the
specified values if provided in the storageCluster CR.
This also includes the refactor of the changing the prometheus rule
process. The function is now easier to read, maintain & expand.
Also add tests for prometheus rule changing process.
Correct the API desc about the defaults for Full Ratios in OCS-Operator
While adding the fields the description of the fields were directly
lifted off from rook-operator. The values of NearFull, BackfillFull &
Full in rook are 0.85, 0.90 & 0.95 respectively. But in OCS Operator
we set these values to 0.75, 0.80 & 0.85 respectively with the help
of the rook-config-override ConfigMap. So the description of the fields
in the API should reflect the actual values that are set in OCS.
Story-https://issues.redhat.com/browse/RHSTOR-6497
BZ-https://bugzilla.redhat.com/show_bug.cgi?id=2303342