Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE 314: use notifier threshold to configure evaluator ShowAll #354

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nickdevp
Copy link

Hello,
This pull request is a proposed fix for #341. Prior to this change, when the 'notifier' coordinator submits an 'EvaluatorRequest', the 'ShowAll' parameter defaults to 'false'. The result is that not all StatusOK partitions are sent to the notification endpoints (i.e., 'http' and 'email').
The changes in this PR use the 'threshold' configuration property within the notifier section to set the 'ShowAll' parameter of the evaluator request. Thus, if the user has set the threshold=1, that notification endpoint will receive StatusOK partitions.

@nickdevp
Copy link
Author

Looks like there is a test which is checking for the ShowAll flag. This is the error I'm seeing in the travis-ci logs:
assert.False(t, request.ShowAll, "Expected ShowAll to be false")

Perhaps I am missing some context here.. why does this test expect it to be false?

@toddpalino
Copy link
Contributor

The test checks that all parameters are valid. If you change the assumptions, the tests need to be changed as well.

However, this isn't going to work. You're setting the value of ShowAll at the coordinator level - that is, for all notifiers - based on the value of an individual notifier. If you have a notifier that has threshold set to 1, and another that has threshold set to 2, you will end up with undesired behavior for the second notifier.

@nickdevp
Copy link
Author

The 'ShowAll' flag at the coordinator level seems to only control the evaluation request. The threshold checks are still done at in the 'notifyModule' function if I am reading this correctly:
https://github.com/linkedin/Burrow/blob/master/core/internal/notifier/coordinator.go#L556

@nickdevp
Copy link
Author

May have gotten the link wrong above. Please look at the following in core/internal/notifier/coordinator.go:
// Only send a notification if the current status is above the module's threshold
if int(status.Status) < viper.GetInt("notifier."+module.GetName()+".threshold") {
return
}

@nickdevp
Copy link
Author

Looks like the above failed due to package github.com/goreleaser/nfpm upgrading to go 1.10.

@toddpalino
Copy link
Contributor

The problem is not where the threshold check is done. It's that the notifier, by default, expects ShowAll to have been false so only the errored partitions are in the Partitions slice.

@nickdevp
Copy link
Author

I didn't understand your comment above. May you please clarify? If you feel that this is the wrong approach, let me know if you have an alternative in mind.
When I run Burrow in my environment with the changes in this PR, I do see StatusOk partitions coming into my http endpoint.

@alexandnpu
Copy link

This is really good for making burrow reporting status for all the partitions when threshold is set to 1.
In my scenario, I need to report the status of every partition to TSDB ( i.e influxdb) no matter whether its health is good or bad. So I think Burrow should provide some ways of sending this metrics out, rather than making some tricks with the notification.

@chasse-code
Copy link

I would also like to have this feature. Our application teams want to get notifications on all their consumers on a consistent basis and want to see how offset are tracking for each consumer. Having the partition information disappear when the status goes back to OK is not great for them.

@toddpalino
Copy link
Contributor

Yes, @nickdevp, that's exactly the problem. You'll also get StatusOk partitions if you set the threshold higher with this patch in place. That's the undesirable behavior I'm talking about.

@nickdevp
Copy link
Author

@toddpalino I am not seeing the behavior you described. If the threshold is set to 1 for notifier N1, but higher for another notifier N2, then N2 will not see StatusOk partitions.

@coveralls
Copy link

coveralls commented May 15, 2018

Coverage Status

Coverage increased (+0.04%) to 74.066% when pulling d8f77cc on nickdevp:ISSU-341 into fecab1e on linkedin:master.

@nickdevp
Copy link
Author

nickdevp commented Jun 6, 2018

May I get an update on this PR? I would like to know how we can move forward on this issue.

@nickdevp
Copy link
Author

nickdevp commented Jul 7, 2018

@toddpalino to mitigate the risk of this PR, perhaps we can add another configuration option to enable this feature? If you have any suggestions, I'd be open to making changes

@akamac
Copy link

akamac commented Feb 15, 2019

@toddpalino I also need Notifier to report all partitions, including those with StatusOK.
How can I achieve this?

Currently .Result.Partitions is empty for send-close event.

@toddpalino
Copy link
Contributor

The only way this could possibly be brought in is if you altered checkAndSendResponseToModules to perform major surgery on the evaluation response when ShowAll is true, in order to assure that the only partitions that are sent to the notifier are the ones that are in an errored state when the threshold is higher than OK.

This has to be accompanied by tests that show that the current behavior is honored in all cases except where the threshold is set to OK. This would need to include the case of multiple notifiers, where one is set to OK and one is set to a higher threshold.

Copy link

@andrewchoi5 andrewchoi5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any updates on this pull request since the month of march?

@richardwalsh
Copy link

Any updates on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants