[WIP] Add connection resume, client state tracking, robust reconnect logic, reconnecting hooks, more #1204

JamesHutchison · 2024-03-05T02:32:48Z

_{By submitting this pull request you agree that all contributions to this project are made under the MIT license.}

Issues

While in a discussion on the discord, it became apparent that ReactPy was unusable for any application where dynamic scaling is required. For example, in cases where Azure/GCP will start/stop workers for dynamic load balancing. As a result this, users will be disconnected randomly and then would have the state of the ReactPy components reset.

There's also other issues I encountered while building this. Please see my comments (will be added soon) which walk through this very large PR. The intent is not to have this PR merged in all at once, rather, it is a guide to subsequent smaller PRs that can properly add tests, etc. I do not have the time to properly land this PR, so I am asking for help.

Solution

The solution to the problem was to have the client store the state that's used for rendering components. The theory is that if the client state matches the server state, then things should work just fine.

In reality, there was some challenges to get to that. The first was that IDs of components and events were random instead of deterministic, so this needed to be changed. Now the IDs are based on their patch path.

The next concern is security. While the state isn't obvious to most users, without proper security a client could manipulate variables that an author would have assumed could only be set by the server. An example might be is_admin or something, or maybe the user ID of user viewing the page. Another issue is that server secrets might leak to the client if someone isn't careful.

For security, all data given a sha-256 hash as a signature. The server will reject a state variable if it doesn't match the hash. Users are given a personal salt, which they must provide to the server upon reconnection, and the server has a secret "pepper" that is added to make things more difficult. I wasn't satisfied with this, so I added OTP codes that will change, by default, every 4 hours. The OTP codes use site-packages directory's contents (specifically, the parent directory of where reactpy is installed) and the create times to generate a random secret that feeds into the SHA-1 function that generates them. The OTP codes (right now, in the past, and in the future) are added to the hash just to make it that much harder.

All state values have a key that is derived from the file and line number. To avoid leaking this information via brute force, the key is a sha-256 that has about half the bits cut off, making it difficult to recreate.

The client behavior has been revamped. It now will always reconnect, has improved reconnect behavior, and also will disconnect on idle activity (default is 4 minutes). When the user moves the mouse or scrolls, it will reconnect. There's also hooks for when it is reconnecting and has successfully reconnected. The default behavior is to gray out the page and display a spinning pipe character. It's not great.

The performance impact appears to be negligible.

Checklist

The checklist will be for each individual feature that should be a separate PR. TBD

Work items remaining (this PR):

Add serialization for datetime types such as timezone and timedelta. This logic is currently in Heavy Resume but makes sense to just give to everyone.
Fix the server outrunning the client by moving the message handlers out of the tsx file and into the javascript file
Comment on the PR on everything

Extracted Issue List

* Fix component decorator eating static type hints

Add reconnection and client state side state capabilities

* Delete commented out code * add serialization for timezone and timedelta * don't show reconnecting layer on first attempt * Only connect after layout update handlers are set up * perf tweak that apparently was never saved * deserialization as well for timezone * timezone and timedelta * alter z-index

Archmonger · 2024-03-05T06:04:12Z

This is definitely way too much within one PR. Can you break this into smaller PRs?

JamesHutchison · 2024-03-05T06:09:52Z

Assuming this headache of mine doesn't turn into something worse, I'll add comments tomorrow. I don't have the capacity to properly break this out, create / update tests, etc for the changes and I'm asking the community for help to land these features. This PR is intended to be a starting point and is not intended to be merged.

Archmonger · 2024-03-05T06:33:48Z

Also just to re-iterate what I said over discord, I've only experienced WS disconnections under high load when ReactPy is not using a BACKHAUL_THREAD.

A potential alternative to client-side state storage is simply migrating that implementation to core.

JamesHutchison · 2024-03-05T16:07:51Z

It's not clear to me how backhaul threads are equivalent to this. Do you have a doc that explains the architecture? My impression from your description was that it helped with stability in django but it wouldn't help if your node count was scaled down and you still had active connections on the terminated node (same with a client's internet disconnecting / reconnecting).

I think having the client manage their own state is a perfect solution. It very much simplifies the infrastructure and reduces costs. There's a slight delay due to the copy of data but since you're reconnecting there's going to be a delay anyways.

JamesHutchison · 2024-03-05T16:09:26Z

src/js/packages/@reactpy/client/src/components.tsx

@@ -29,7 +29,7 @@ export function Layout(props: { client: ReactPyClient }): JSX.Element {

  useEffect(
    () =>
-      props.client.onMessage("layout-update", ({ path, model }) => {
+      props.client.onLayoutUpdate((path: string, model: any) => {


Issue A: bug fix for when server would return "layout-update" before the client was ready.

Does this issue only exist due to the client state management stuff in this PR?

It's been a long while. My intuition is "no" but one would need to check the code. IIRC there's a delay from connecting to setting up the handlers and no way to set up the handlers ahead of time, but this is top of my head and I don't have the code in front of me.

JamesHutchison · 2024-03-05T16:12:07Z

src/js/packages/@reactpy/client/src/components.tsx

+    const scriptElement: HTMLScriptElement = document.createElement("script");
+    for (const [k, v] of Object.entries(model.attributes || {})) {
+      scriptElement.setAttribute(k, v);
+    }
+    if (scriptContent) {
+      scriptElement.appendChild(document.createTextNode(scriptContent));
    }
-  }, [model.key, ref.current]);
+    ref.current.appendChild(scriptElement);
+  }, [model.key]);


Issue B: bug fix for inconsistent script rendering that resulted in double execution of scripts. Scripts also always are returned to the DOM for user inspection rather than executed directly.

The current behavior was to immediately execute a script with no properties. If it had properties, it went to the DOM instead. These different code paths likely hid the bug where scripts were double executing.

The fix for the double execution was to remove ref.current from the dependency list

I'm testing out this implementation in #1239 but it seems that it breaks script re-execution when the script key (or content) changes.

That sounds like the issue I've seen with inputs. You'll change the key and value on the input but existing value sticks around. It's an issue if you have a list of objects with inputs, update the list, then redraw. The browser shows the old value and doesn't update it.

The workaround IIRC is to wrap it and change the key on what its wrapped in.

I was under the impression my code change to support reconnection was the cause but maybe its not? If that's the case, it sounds like its a bug that affects everything, not just scripts.

Also, just to call this out, I'm not sure having script tags execute multiple times because they changed is necessarily a good practice. I can think of a couple scenarios where implementation changes such as how renders are sent to the client might create behavioral changes due to "in-between" states getting skipped. Seems like it should always be one script one execution.

I could agree with that - Given that sequential renders that affect the same components are "optimized out" by layout.py, there is definitely a reality where things could get skipped.

I am also aware that layout.py has some bugs, so it's on my plate to rewrite that.

See issue:

use_effect's unmount method is not always called with dynamically rendered children #1198

JamesHutchison · 2024-03-05T16:13:02Z

src/js/packages/@reactpy/client/src/messages.ts

+export type ReconnectingCheckMessage = {
+  type: "reconnecting-check";
+  value: string;
+}
+
+export type IncomingMessage = LayoutUpdateMessage | ReconnectingCheckMessage;
+export type OutgoingMessage = LayoutEventMessage | ReconnectingCheckMessage;


Issue C: Message updates. I think looking at it out now I didn't finish this with the most recent messages, such as "state-update"

JamesHutchison · 2024-03-05T16:13:29Z

src/js/packages/@reactpy/client/src/reactpy-client.ts

+  onLayoutUpdate(handler: (path: string, model: any) => void): void;
+


JamesHutchison · 2024-03-05T16:14:24Z

src/js/packages/@reactpy/client/src/reactpy-client.ts

+
+  /**
+   * Update state vars from the server for reconnections
+   * @param givenStateVars State vars to store
+   */
+  updateStateVars(givenStateVars: object): void;


Issue D: Have client keep track of state variables

JamesHutchison · 2024-03-05T19:02:30Z

src/py/reactpy/reactpy/core/hooks.py

+def get_caller_info():
+    # Get the current stack frame and then the frame above it
+    caller_frame = sys._getframe(2)
+    for i in range(50):
+        render_frame = sys._getframe(4 + i)
+        patch_path = render_frame.f_locals.get("patch_path_for_state")
+        if patch_path is not None:
+            break
+    # Extract the relevant information: file path and line number and hash it
+    return f"{caller_frame.f_code.co_filename} {caller_frame.f_lineno} {patch_path}"


Ugly hack - to preserve the interface, it walks up the stack frames to find patch_path_for_state in the locals. I would suggest refactoring it

This returns the unique identifier of each state / ref/ hook call based on what called it. This is eventually hashed and then the hash is truncated to create the key. The purpose of truncate the hash is to make it more challenging to reproduce this information.

JamesHutchison · 2024-03-05T19:04:39Z

src/py/reactpy/reactpy/core/hooks.py

+    if __debug__:
+        __DEBUG_CALLER_INFO_TO_STATE_KEY[result] = caller_info


When keys are missing, or the value doesn't serialize / deserialize this variable could be used to figure out what its talking about

JamesHutchison · 2024-03-05T19:05:19Z

src/py/reactpy/reactpy/core/hooks.py

 class _CurrentState(Generic[_Type]):
-    __slots__ = "value", "dispatch"
+    __slots__ = "key", "value", "dispatch"


Issue K - slots are probably fine here

JamesHutchison · 2024-03-05T19:06:55Z

src/py/reactpy/reactpy/core/hooks.py

+    hook = get_current_hook()
+    if hook.reconnecting.current:
+        if not isinstance(dependencies, ReconnectingOnly):
+            return
+        dependencies = None
+    else:
+        if isinstance(dependencies, ReconnectingOnly):
+            return
+        dependencies = _try_to_infer_closure_values(function, dependencies)


The logic for whether something is executed on reconnection. The use_memo and use_ref lines must be located where they will always get called

JamesHutchison · 2024-03-05T19:08:52Z

src/py/reactpy/reactpy/core/hooks.py

-        logger.debug(f"{current_hook().component} {new}")
+        logger.debug(f"{get_current_hook().component} {new}")


Issue J - current_hook was renamed to get_current_hook to follow naming convention

JamesHutchison · 2024-03-05T19:10:13Z