Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce node health as a metric #2070

Closed
Tracked by #2154
danisharora099 opened this issue Jul 17, 2024 · 3 comments · Fixed by #2080
Closed
Tracked by #2154

feat: introduce node health as a metric #2070

danisharora099 opened this issue Jul 17, 2024 · 3 comments · Fixed by #2080
Assignees

Comments

@danisharora099
Copy link
Collaborator

This is a feature request

Problem

Based on the Reliability RFC,

Node health is a metric meant to determine the connectivity state of a light node and its present ability to reliably send and receive messages from the network. We consider this reliability to be dependent on amount of simultaneous connections to responsive service nodes. Unfortunately the more connections light node establishes - the more bandwidth is consumed. To address this we RECOMMEND following states:

unhealthy - no connections to service nodes are available regardless of protocol;
minimally healthy:
Filter has one service node connection;
LightPush protocol has one service node connection;
sufficiently healthy:
Filter has at least 2 connections available to service nodes;
LightPush has at least 2 connections available to service nodes;

Proposed Solutions

Introduce this as an API

Notes

@fryorcraken fryorcraken added this to Waku Jul 17, 2024
@weboko weboko moved this to Triage in Waku Jul 17, 2024
@danisharora099 danisharora099 moved this from Triage to In Progress in Waku Jul 18, 2024
@weboko
Copy link
Collaborator

weboko commented Jul 18, 2024

Some notes:

  • we should expose state from WakuNode directly;
  • we should expose event based API for consumers to be up to date with changes;
  • documentation would be important to provide;

Follow up to this feature would be to develop strategies for recovery from mentioned health states: #2076

@vpavlin
Copy link
Member

vpavlin commented Jul 18, 2024

I feel like 2 nodes might not be enough for calling the state sufficiently healthy - we are working with decentrlaized peer-to-peer potentially unreliable connections:)

@weboko
Copy link
Collaborator

weboko commented Jul 22, 2024

thread: https://discord.com/channels/1110799176264056863/1263470028540346391/1263470032172613776

as we discussed it on PM - this can be approached differently, node can be considered minimally health or unhealthy given the example from the description.

it's the matter how we correlate individual healthiness of protocols by & or |

for js-waku in particular it should be an exceptional case when lightPush is unhealthy and filter is fine, reason for it is:

  • we maintain up to date peers in the background;
  • if there are drops or failures - we rotate peers;

given that js-waku will transition into fully healthy or fully unhealthy state eventually

for implementation it seems only reasonable to expose health state per protocol and then combine
from my side, giving it another thought and read for RFC - I think we should use & so that if even one protocol unhealthy - whole node is.

@danisharora099 danisharora099 moved this from In Progress to Code Review / QA in Waku Jul 24, 2024
@danisharora099 danisharora099 moved this from Code Review / QA to In Progress in Waku Jul 24, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Waku Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants