-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add connection resume, client state tracking, robust reconnect logic, reconnecting hooks, more #1204
base: main
Are you sure you want to change the base?
Conversation
* Fix component decorator eating static type hints
Add reconnection and client state side state capabilities
* Delete commented out code * add serialization for timezone and timedelta * don't show reconnecting layer on first attempt * Only connect after layout update handlers are set up * perf tweak that apparently was never saved * deserialization as well for timezone * timezone and timedelta * alter z-index
This is definitely way too much within one PR. Can you break this into smaller PRs? |
Assuming this headache of mine doesn't turn into something worse, I'll add comments tomorrow. I don't have the capacity to properly break this out, create / update tests, etc for the changes and I'm asking the community for help to land these features. This PR is intended to be a starting point and is not intended to be merged. |
Also just to re-iterate what I said over discord, I've only experienced WS disconnections under high load when ReactPy is not using a A potential alternative to client-side state storage is simply migrating that implementation to core. |
It's not clear to me how backhaul threads are equivalent to this. Do you have a doc that explains the architecture? My impression from your description was that it helped with stability in django but it wouldn't help if your node count was scaled down and you still had active connections on the terminated node (same with a client's internet disconnecting / reconnecting). I think having the client manage their own state is a perfect solution. It very much simplifies the infrastructure and reduces costs. There's a slight delay due to the copy of data but since you're reconnecting there's going to be a delay anyways. |
@@ -29,7 +29,7 @@ export function Layout(props: { client: ReactPyClient }): JSX.Element { | |||
|
|||
useEffect( | |||
() => | |||
props.client.onMessage("layout-update", ({ path, model }) => { | |||
props.client.onLayoutUpdate((path: string, model: any) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue A: bug fix for when server would return "layout-update" before the client was ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this issue only exist due to the client state management stuff in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a long while. My intuition is "no" but one would need to check the code. IIRC there's a delay from connecting to setting up the handlers and no way to set up the handlers ahead of time, but this is top of my head and I don't have the code in front of me.
const scriptElement: HTMLScriptElement = document.createElement("script"); | ||
for (const [k, v] of Object.entries(model.attributes || {})) { | ||
scriptElement.setAttribute(k, v); | ||
} | ||
if (scriptContent) { | ||
scriptElement.appendChild(document.createTextNode(scriptContent)); | ||
} | ||
}, [model.key, ref.current]); | ||
ref.current.appendChild(scriptElement); | ||
}, [model.key]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue B: bug fix for inconsistent script rendering that resulted in double execution of scripts. Scripts also always are returned to the DOM for user inspection rather than executed directly.
The current behavior was to immediately execute a script with no properties. If it had properties, it went to the DOM instead. These different code paths likely hid the bug where scripts were double executing.
The fix for the double execution was to remove ref.current
from the dependency list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing out this implementation in #1239 but it seems that it breaks script re-execution when the script key
(or content
) changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like the issue I've seen with inputs. You'll change the key and value on the input but existing value sticks around. It's an issue if you have a list of objects with inputs, update the list, then redraw. The browser shows the old value and doesn't update it.
The workaround IIRC is to wrap it and change the key on what its wrapped in.
I was under the impression my code change to support reconnection was the cause but maybe its not? If that's the case, it sounds like its a bug that affects everything, not just scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just to call this out, I'm not sure having script tags execute multiple times because they changed is necessarily a good practice. I can think of a couple scenarios where implementation changes such as how renders are sent to the client might create behavioral changes due to "in-between" states getting skipped. Seems like it should always be one script one execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could agree with that - Given that sequential renders that affect the same components are "optimized out" by layout.py
, there is definitely a reality where things could get skipped.
I am also aware that layout.py
has some bugs, so it's on my plate to rewrite that.
See issue:
export type ReconnectingCheckMessage = { | ||
type: "reconnecting-check"; | ||
value: string; | ||
} | ||
|
||
export type IncomingMessage = LayoutUpdateMessage | ReconnectingCheckMessage; | ||
export type OutgoingMessage = LayoutEventMessage | ReconnectingCheckMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue C: Message updates. I think looking at it out now I didn't finish this with the most recent messages, such as "state-update"
onLayoutUpdate(handler: (path: string, model: any) => void): void; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue A
|
||
/** | ||
* Update state vars from the server for reconnections | ||
* @param givenStateVars State vars to store | ||
*/ | ||
updateStateVars(givenStateVars: object): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue D: Have client keep track of state variables
def get_caller_info(): | ||
# Get the current stack frame and then the frame above it | ||
caller_frame = sys._getframe(2) | ||
for i in range(50): | ||
render_frame = sys._getframe(4 + i) | ||
patch_path = render_frame.f_locals.get("patch_path_for_state") | ||
if patch_path is not None: | ||
break | ||
# Extract the relevant information: file path and line number and hash it | ||
return f"{caller_frame.f_code.co_filename} {caller_frame.f_lineno} {patch_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugly hack - to preserve the interface, it walks up the stack frames to find patch_path_for_state
in the locals. I would suggest refactoring it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns the unique identifier of each state / ref/ hook call based on what called it. This is eventually hashed and then the hash is truncated to create the key. The purpose of truncate the hash is to make it more challenging to reproduce this information.
if __debug__: | ||
__DEBUG_CALLER_INFO_TO_STATE_KEY[result] = caller_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When keys are missing, or the value doesn't serialize / deserialize this variable could be used to figure out what its talking about
class _CurrentState(Generic[_Type]): | ||
__slots__ = "value", "dispatch" | ||
__slots__ = "key", "value", "dispatch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue K - slots are probably fine here
hook = get_current_hook() | ||
if hook.reconnecting.current: | ||
if not isinstance(dependencies, ReconnectingOnly): | ||
return | ||
dependencies = None | ||
else: | ||
if isinstance(dependencies, ReconnectingOnly): | ||
return | ||
dependencies = _try_to_infer_closure_values(function, dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for whether something is executed on reconnection. The use_memo
and use_ref
lines must be located where they will always get called
logger.debug(f"{current_hook().component} {new}") | ||
logger.debug(f"{get_current_hook().component} {new}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue J - current_hook
was renamed to get_current_hook
to follow naming convention
"reconnecting", | ||
"client_state", | ||
"_state_recovery_serializer", | ||
"_state_var_lock", | ||
"_hook_state_token", | ||
"_previous_states", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue J - probably shouldn't be using slots
# if not isinstance(root, ComponentType): | ||
# msg = f"Expected a ComponentType, not {type(root)!r}." | ||
# raise TypeError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue N - this is very expensive. On my performance trace, nearly half the render time was going to isinstance checks for protocols. I highly recommend removing protocols entirely because they hurt the developer experience since they are disjoint from the actual classes, and because they incur a performance penalty. Just use a base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think these checks should be behind a if __debug__:
so that we're not wasting cycles in production on something like this.
self.root = root | ||
self.reconnecting = Ref(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ref is significant. When stuff gets created, on reconnection, it just hangs out forever until its reused.
I would prefer refactoring the layout code so that its a bit cleaner and this isn't tied to the state like this. Like you just have a context var that has everything you need to know. However, the current way things are executed get in the way of doing that because stuff just ends up on the main loop in isolation.
async def __aenter__(self) -> Layout: | ||
return await self.start() | ||
|
||
async def start(self) -> Layout: | ||
self._hook_state_token = create_hook_state() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue D - the refactor to start
is needed so that we can restart the process from scratch if state rebuild fails.
@@ -122,7 +174,7 @@ async def deliver(self, event: LayoutEventMessage) -> None: | |||
except Exception: | |||
logger.exception(f"Failed to execute event handler {handler}") | |||
else: | |||
logger.info( | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell this shouldn't happen under normal circumstances so I upgraded this to a warning
max_num_state_objects: int = 512, | ||
max_object_length: int = 40000, | ||
default_serializer: Callable[[Any], bytes] | None = None, | ||
deserializer_map: dict[type, Callable[[Any], Any]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A map of object to deserialization. By default, TheObjectType(**kwargs)
is tried which probably won't work for nested objects, but might.
type_id = b"1" # bool | ||
if obj is None: | ||
return "0", "", "" | ||
match obj: | ||
case True: | ||
result = b"true" | ||
case False: | ||
result = b"false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra complexity is for performance reasons.
None
is not signed.
Boolean values are really common so this is a shortcut.
if isinstance(child, ComponentType) and child.key is None: | ||
if child.key is None and isinstance(child, ComponentType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue N - perf: since this is a warning condition, the cheaper check is done first.
@property | ||
def value(self) -> _RefValue: | ||
return self.current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligns the interface with use_state
for serialization
def __init__( | ||
self, initial_value: _RefValue = _UNDEFINED, key: str | None = None | ||
) -> None: | ||
from reactpy.core._life_cycle_hook import get_current_hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this works around a circular import
My suggestion on approach:
|
Example usage that supports Pydantic / SQLModel objects: def default_serializer(obj: Any) -> Any:
if isinstance(obj, BaseModel):
return obj.dict()
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
state_recovery_manager = StateRecoveryManager(
serializable_types=[
TemporaryUser,
QAMessage,
],
pepper="pepper",
default_serializer=default_serializer,
)
configure(app, Main, Options(head=head, cors=cors_options), state_recovery_manager) |
…on (#6) * Fix for client not reconnecting on activity * Type improvements * comment improvements
If you want to see this in action, it is live on heavyresume.com Inactivity timeout is set to 4 minutes. You can see it happen in the console. If you scroll down and let it time out, you'll notice that it doesn't regenerate the view on reconnect. Moving the mouse or scrolling triggers reconnection. Thanks! |
Fix Exception catch-all in render swallowing state recovery errors Fix serialization errors not resulting in a state recovery error
One thing I'd like to bring attention to - if the layout logic is rewritten, please leave some means to re-render all components of a certain type. Next best thing is some means to trigger a whole page re-render from the client state. The use case would be hot reloading. Right now I can successfully use jurigged to hot reload - however unless the component has a toggle or something to force a re-render, I have to reload the page. edit: this is no longer true with my PR below. That PR will disconnect and reconnect and force a rerender. |
This PR (on my fork) leverages the logic from this PR to add hot reloading. I will not be merging it into my main branch because I don't want to clutter things up further, and I'm fine just checking out that one branch in the mean time. |
This linked PR fixes a bug where |
After using the heavy stack for many months, here's the known issues that are currently unresolved and may be attributed to my changes:
I'm pretty sure the input hell stems from the change that gives state a deterministic target based on the DOM path. This change was needed to support state rebuild. The rest may be existing ReactPy issues. |
By submitting this pull request you agree that all contributions to this project are made under the MIT license.
Issues
While in a discussion on the discord, it became apparent that ReactPy was unusable for any application where dynamic scaling is required. For example, in cases where Azure/GCP will start/stop workers for dynamic load balancing. As a result this, users will be disconnected randomly and then would have the state of the ReactPy components reset.
There's also other issues I encountered while building this. Please see my comments (will be added soon) which walk through this very large PR. The intent is not to have this PR merged in all at once, rather, it is a guide to subsequent smaller PRs that can properly add tests, etc. I do not have the time to properly land this PR, so I am asking for help.
Solution
The solution to the problem was to have the client store the state that's used for rendering components. The theory is that if the client state matches the server state, then things should work just fine.
In reality, there was some challenges to get to that. The first was that IDs of components and events were random instead of deterministic, so this needed to be changed. Now the IDs are based on their patch path.
The next concern is security. While the state isn't obvious to most users, without proper security a client could manipulate variables that an author would have assumed could only be set by the server. An example might be
is_admin
or something, or maybe the user ID of user viewing the page. Another issue is that server secrets might leak to the client if someone isn't careful.For security, all data given a sha-256 hash as a signature. The server will reject a state variable if it doesn't match the hash. Users are given a personal salt, which they must provide to the server upon reconnection, and the server has a secret "pepper" that is added to make things more difficult. I wasn't satisfied with this, so I added OTP codes that will change, by default, every 4 hours. The OTP codes use site-packages directory's contents (specifically, the parent directory of where reactpy is installed) and the create times to generate a random secret that feeds into the SHA-1 function that generates them. The OTP codes (right now, in the past, and in the future) are added to the hash just to make it that much harder.
All state values have a key that is derived from the file and line number. To avoid leaking this information via brute force, the key is a sha-256 that has about half the bits cut off, making it difficult to recreate.
The client behavior has been revamped. It now will always reconnect, has improved reconnect behavior, and also will disconnect on idle activity (default is 4 minutes). When the user moves the mouse or scrolls, it will reconnect. There's also hooks for when it is reconnecting and has successfully reconnected. The default behavior is to gray out the page and display a spinning pipe character. It's not great.
The performance impact appears to be negligible.
Checklist
The checklist will be for each individual feature that should be a separate PR. TBD
Work items remaining (this PR):
Extracted Issue List
hasattr
in the manner identified