Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip Latency Monitoring #174

Open
andydunstall opened this issue Nov 12, 2024 · 0 comments
Open

Gossip Latency Monitoring #174

andydunstall opened this issue Nov 12, 2024 · 0 comments
Assignees

Comments

@andydunstall
Copy link
Owner

andydunstall commented Nov 12, 2024

Add support for gossip entry timestamps which are set when an entry is created and propagated to the rest of the cluster. Nodes can then use these timestamps to calculate how long it takes for entries to be propagated around the cluster.

Those times can then be exposed as metrics, then used to configure gossip or detect when the cluster is overloaded

Versioning

Adding a new field will require a new gossip protocol version, so nodes must support both the existing version (0) and the new version (1).

Evaluation

Adding these metrics can also be used to evaluate the scaling limits of gossip. Such as extend piko test workload upstreams to support to add --churn flags indicating how often each upstream should reconnect.

That can then be used to understand how much churn a cluster with default gossip settings can support before latency exceeds some threshold (say 10 seconds).

(Can also proxy each gossip node to inject latency and dropped messages)

@andydunstall andydunstall self-assigned this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant