-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goldpinger is too sensitive to autoscaler activity #87
Comments
Hello people! We have the same issue ^^ but filtering out a node because of status So we better focus on filtering out nodes on different label/attribute like |
Hello! Any updates on this one? |
#107 should help - after the node eviction timeout, pods go to terminating state and goldpinger will stop trying to reach them |
@rbtr thank you. I will give it a go and come back with some feedback in the next days. |
Describe the bug
The order of operations for removing a node in Kubernetes is:
The time between 2 and 3 can be quite long (many minutes in some clouds). Goldpinger continuously tries to reach the node during this time causing spikes in Goldpinger metrics.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Goldpinger should provide a mechanism to filter out NotReady nodes from metric queries to focus on Nodes which are expected to be functioning normally.
Screenshots
Here's an example showing Goldpinger error rates spike as a cluster scaled down over a period of hours.
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: