Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve node resolving for nodes offline at startup #357

Merged
merged 3 commits into from
Jul 11, 2023

Conversation

marcelveldt
Copy link
Contributor

@marcelveldt marcelveldt commented Jul 11, 2023

  • Prevent concurrent resolve actions with a lock
  • Simplify retry logic
  • Retry more often

When one or more nodes are offline/unavailable (e.g. powered off, out of range, out of battery) at server startup, no automatic subscription is set-up for that device. Because the SDK does not expose a way to add a callback when a known node is discovered on mDNS, we do a regular poll to see if the device is back alive. This is now fixed and the poll interval is a little bit increased. It will start with a 30 second interval, slowly increasing that with 10 seconds to a maximum of 10 minutes, meaning that if one of those devices comes back online, it will take max 10 minutes for the server to pick the device back up again.

Once a device subscription has been set-up, a resubscription is done within a few seconds by the SDK.
It will however take 3 minutes before we detect that a device is offline which seems like a fair tradeoff between traffic vs comfort.

If we want to improve this even further, we may consider implementing zeroconf discovery within the server to actively listen for mDNS broadcasts for nodes we're watching but as this is a bit of an edge case (the device offline just while restarting the server) it may not be the highest priority.

@marcelveldt marcelveldt added bug bugfix Pull request that fixes a (known) issue/bug and removed bug labels Jul 11, 2023
await self._resolve_node(
node_id=node_id, retries=retries - 1, allow_pase=retries - 1 == 0
)
await self._resolve_node(node_id=node_id, retries=retries - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer a while loop instead of recursion but since we had it already I am fine with it.

@marcelveldt marcelveldt merged commit 5dcb8be into main Jul 11, 2023
4 checks passed
@marcelveldt marcelveldt deleted the improve-resolve branch July 11, 2023 14:37
LOGGER.debug(
"Attempting to resolve node %s (with PASE connection)", node_id
async with node_lock, self._resolve_lock:
LOGGER.info("Attempting to resolve node %s...", node_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd log at debug level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a (known) issue/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants