Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create prometheus metrics for missed informer events during a relist interval #977

Closed
cmwylie19 opened this issue Jul 23, 2024 · 0 comments · Fixed by #983
Closed

Create prometheus metrics for missed informer events during a relist interval #977

cmwylie19 opened this issue Jul 23, 2024 · 0 comments · Fixed by #983
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@cmwylie19
Copy link
Collaborator

cmwylie19 commented Jul 23, 2024

Is your feature request related to a problem? Please describe.

It is a well documented problem that a watch connection can miss events in Kubernetes. Due to that, there is an informer pattern that will do a list of watched resources every so often to correct the internal cache.

In Pepr, if the LOG_LEVEL is not set to verbose, it is impossible to understand what is going on with the informer. We need a way to calculate the number of missed events each relist interval so we know if we need to shorten the window.

Metrics we need to see:

  • cacheMissesPerWindow
  • retryCounts - If there are many cache misses per window and the retryCount is never reaching the retryLimit then the retryLimit is too high. This is in the case where there is only one event missed and the retryCount keeps getting reset to 0 but events are still being frequently missed. It ultimately means the only way we will reach homeostasis is the relist

Can also tell us to shrink the relist window.

Describe the solution you'd like

  • Given I am curious about how many events are being missed by the watch connection
  • When I look at the Pepr metrics
  • Then I see windows with how many misses occurred during them

Describe alternatives you've considered

(optional) A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

@cmwylie19 cmwylie19 added the enhancement New feature or request label Jul 23, 2024
@cmwylie19 cmwylie19 self-assigned this Jul 24, 2024
@cmwylie19 cmwylie19 added this to the v0.34.0 milestone Jul 24, 2024
btlghrants pushed a commit to defenseunicorns/kubernetes-fluent-client that referenced this issue Jul 29, 2024
## Description

This adds `WatchEvent` types for consuming metrics from the informer.


Please look at [Watch
Config](https://github.com/defenseunicorns/kubernetes-fluent-client/blob/8f6bed408fc967fb4f68b60001cf8a8dc5f7bc5e/src/fluent/watch.ts#L49),
as some configuration options have been renamed.

## Related Issue

Fixes #defenseunicorns/pepr#983

<!-- or -->

Relates to # defenseunicorns/pepr#977

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://docs.pepr.dev/main/contribute/#submitting-a-pull-request)
followed

BREAKING CHANGE: This changes the names on the WatchConfig. Look at the
WatchConfig as some configuration options have been renamed.

---------

Signed-off-by: Case Wylie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant