Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backup/restore Prometheus metrics #3210

Open
zacharya19 opened this issue Jun 24, 2024 · 3 comments
Open

Add backup/restore Prometheus metrics #3210

zacharya19 opened this issue Jun 24, 2024 · 3 comments
Assignees

Comments

@zacharya19
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I'm always frustrated when my DragonflyDB is failing to preform snapshot/restore and I have no visibility of it.
For example: When the IAM of the service is broken and DragonflyDB get's access denied.

Describe the solution you'd like
Expose Promethues metrics such as:

snapshot_failure_count
restore_failure_count

Describe alternatives you've considered
The logs has an error message, but using it for alerts is annoying.

Additional context
Add any other context or screenshots about the feature request here.

@zacharya19
Copy link
Contributor Author

I will add: it can also be a general counter snapshot count (fail and success) in addition to a gauge with fail count.
Might be more useful that way.

@romange
Copy link
Collaborator

romange commented Jun 24, 2024

Does dragonfly proceed if it does not succeed to load the snapshot at the start?

@zacharya19
Copy link
Contributor Author

It does stay alive, haven't checked if the connection is healthy.
It can also happened after successful boot (e.g. removing the permission to s3 after boot).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants