Improve node resolving for nodes offline at startup #357

marcelveldt · 2023-07-11T13:09:43Z

Prevent concurrent resolve actions with a lock
Simplify retry logic
Retry more often

When one or more nodes are offline/unavailable (e.g. powered off, out of range, out of battery) at server startup, no automatic subscription is set-up for that device. Because the SDK does not expose a way to add a callback when a known node is discovered on mDNS, we do a regular poll to see if the device is back alive. This is now fixed and the poll interval is a little bit increased. It will start with a 30 second interval, slowly increasing that with 10 seconds to a maximum of 10 minutes, meaning that if one of those devices comes back online, it will take max 10 minutes for the server to pick the device back up again.

Once a device subscription has been set-up, a resubscription is done within a few seconds by the SDK.
It will however take 3 minutes before we detect that a device is offline which seems like a fair tradeoff between traffic vs comfort.

If we want to improve this even further, we may consider implementing zeroconf discovery within the server to actively listen for mDNS broadcasts for nodes we're watching but as this is a bit of an edge case (the device offline just while restarting the server) it may not be the highest priority.

agners · 2023-07-11T14:36:10Z

matter_server/server/device_controller.py

-                await self._resolve_node(
-                    node_id=node_id, retries=retries - 1, allow_pase=retries - 1 == 0
-                )
+            await self._resolve_node(node_id=node_id, retries=retries - 1)


I think I'd prefer a while loop instead of recursion but since we had it already I am fine with it.

MartinHjelmare · 2023-07-12T06:17:46Z

matter_server/server/device_controller.py

-                LOGGER.debug(
-                    "Attempting to resolve node %s (with PASE connection)", node_id
+            async with node_lock, self._resolve_lock:
+                LOGGER.info("Attempting to resolve node %s...", node_id)


I'd log at debug level.

marcelveldt added 2 commits July 11, 2023 15:02

Improve node resolving for nodes offline at startup

2b71c0b

update comment

359a10c

marcelveldt added bug bugfix Pull request that fixes a (known) issue/bug and removed bug labels Jul 11, 2023

place logging within lock

6c9d5bc

agners approved these changes Jul 11, 2023

View reviewed changes

marcelveldt merged commit 5dcb8be into main Jul 11, 2023
4 checks passed

marcelveldt deleted the improve-resolve branch July 11, 2023 14:37

MartinHjelmare reviewed Jul 12, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve node resolving for nodes offline at startup #357

Improve node resolving for nodes offline at startup #357

marcelveldt commented Jul 11, 2023 •

edited

Loading

agners Jul 11, 2023

MartinHjelmare Jul 12, 2023

Improve node resolving for nodes offline at startup #357

Improve node resolving for nodes offline at startup #357

Conversation

marcelveldt commented Jul 11, 2023 • edited Loading

agners Jul 11, 2023

Choose a reason for hiding this comment

MartinHjelmare Jul 12, 2023

Choose a reason for hiding this comment

marcelveldt commented Jul 11, 2023 •

edited

Loading